BACKGROUND: Accurate diagnosis and early detection of complex diseases, such as Parkinson's disease, has the potential to be of great benefit for researchers and clinical practice. We aimed to create a non-invasive, accurate classification model for the diagnosis of Parkinson's disease, which could serve as a basis for future disease prediction studies in longitudinal cohorts. METHODS: We developed a model for disease classification using data from the Parkinson's Progression Marker Initiative (PPMI) study for 367 patients with Parkinson's disease and phenotypically typical imaging data and 165 controls without neurological disease. Olfactory function, genetic risk, family history of Parkinson's disease, age, and gender were algorithmically selected by stepwise logistic regression as significant contributors to our classifying model. We then tested the model with data from 825 patients with Parkinson's disease and 261 controls from five independent cohorts with varying recruitment strategies and designs: the Parkinson's Disease Biomarkers Program (PDBP), the Parkinson's Associated Risk Study (PARS), 23andMe, the Longitudinal and Biomarker Study in PD (LABS-PD), and the Morris K Udall Parkinson's Disease Research Center of Excellence cohort (Penn-Udall). Additionally, we used our model to investigate patients who had imaging scans without evidence of dopaminergic deficit (SWEDD). FINDINGS: In the population from PPMI, our initial model correctly distinguished patients with Parkinson's disease from controls at an area under the curve (AUC) of 0·923 (95% CI 0·900-0·946) with high sensitivity (0·834, 95% CI 0·711-0·883) and specificity (0·903, 95% CI 0·824-0·946) at its optimum AUC threshold (0·655). All Hosmer-Lemeshow simulations suggested that when parsed into random subgroups, the subgroup data matched that of the overall cohort. External validation showed good classification of Parkinson's disease, with AUCs of 0·894 (95% CI 0·867-0·921) in the PDBP cohort, 0·998 (0·992-1·000) in PARS, 0·955 (no 95% CI available) in 23andMe, 0·929 (0·896-0·962) in LABS-PD, and 0·939 (0·891-0·986) in the Penn-Udall cohort. Four of 17 SWEDD participants who our model classified as having Parkinson's disease converted to Parkinson's disease within 1 year, whereas only one of 38 SWEDD participants who were not classified as having Parkinson's disease underwent conversion (test of proportions, p=0·003). INTERPRETATION: Our model provides a potential new approach to distinguish participants with Parkinson's disease from controls. If the model can also identify individuals with prodromal or preclinical Parkinson's disease in prospective cohorts, it could facilitate identification of biomarkers and interventions. FUNDING: National Institute on Aging, National Institute of Neurological Disorders and Stroke, and the Michael J Fox Foundation.