Low-back pain (LBP) is a common condition seen in primary care. A principal aim during a clinical examination is to identify patients with a higher likelihood of underlying serious pathology, such as vertebral fracture, who may require additional investigation and specific treatment. All 'evidence-based' clinical practice guidelines recommend the use of red flags to screen for serious causes of back pain. However, it remains unclear if the diagnostic accuracy of red flags is sufficient to support this recommendation. To assess the diagnostic accuracy of red flags obtained in a clinical history or physical examination to screen for vertebral fracture in patients presenting with LBP. Electronic databases were searched for primary studies between the earliest date and 7 March 2012. Forward and backward citation searching of eligible studies was also conducted. Studies were considered if they compared the results of any aspect of the history or test conducted in the physical examination of patients presenting for LBP or examination of the lumbar spine, with a reference standard (diagnostic imaging). The selection criteria were independently applied by two review authors. Three review authors independently conducted 'Risk of bias' assessment and data extraction. Risk of bias was assessed using the 11-item QUADAS tool. Characteristics of studies, patients, index tests and reference standards were extracted. Where available, raw data were used to calculate sensitivity and specificity with 95% confidence intervals (CI). Due to the heterogeneity of studies and tests, statistical pooling was not appropriate and the analysis for the review was descriptive only. Likelihood ratios for each test were calculated and used as an indication of clinical usefulness. Eight studies set in primary (four), secondary (one) and tertiary care (accident and emergency = three) were included in the review. Overall, the risk of bias of studies was moderate with high risk of selection and verification bias the predominant flaws. Reporting of index and reference tests was poor. The prevalence of vertebral fracture in accident and emergency settings ranged from 6.5% to 11% and in primary care from 0.7% to 4.5%. There were 29 groups of index tests investigated however, only two featured in more than two studies. Descriptive analyses revealed that three red flags in primary care were potentially useful with meaningful positive likelihood ratios (LR+) but mostly imprecise estimates (significant trauma, older age, corticosteroid use; LR+ point estimate ranging 3.42 to 12.85, 3.69 to 9.39, 3.97 to 48.50 respectively). One red flag in tertiary care appeared informative (contusion/abrasion; LR+ 31.09, 95% CI 18.25 to 52.96). The results of combined tests appeared more informative than individual red flags with LR+ estimates generally greater in magnitude and precision. The available evidence does not support the use of many red flags to specifically screen for vertebral fracture in patients presenting for LBP. Based on evidence from single studies, few individual red flags appear informative as most have poor diagnostic accuracy as indicated by imprecise estimates of likelihood ratios. When combinations of red flags were used the performance appeared to improve. From the limited evidence, the findings give rise to a weak recommendation that a combination of a small subset of red flags may be useful to screen for vertebral fracture. It should also be noted that many red flags have high false positive rates; and if acted upon uncritically there would be consequences for the cost of management and outcomes of patients with LBP. Further research should focus on appropriate sets of red flags and adequate reporting of both index and reference tests.