Past climates provide a test of models' ability to predict climate change. We present a comprehensive evaluation of state-of-the-art models against Last Glacial Maximum and mid-Holocene climates, using reconstructions of land and ocean climates and simulations from the Palaeoclimate Modelling and Coupled Modelling Intercomparison Projects. Newer models do not perform better than earlier versions despite higher resolution and complexity. Differences in climate sensitivity only weakly account for differences in model performance. In the glacial, models consistently underestimate land cooling (especially in winter) and overestimate ocean surface cooling (especially in the tropics). In the mid-Holocene, models generally underestimate the precipitation increase in the northern monsoon regions, and overestimate summer warming in central Eurasia. Models generally capture large-scale gradients of climate change but have more limited ability to reproduce spatial patterns. Despite these common biases, some models perform better than others.