Weather radar systems are an important tool in commercial aviation to safeguard the safety and security of aircraft. However, the utility of weather radar systems lies in the accuracy and the reliability of the interpretations of the displays. The primary aim of this study was to determine whether experienced pilots could be clustered based on their assessments of the turbulence associated with simulated weather radar displays and whether these groups corresponded to differences in experience-related metrics. Sixty one participants completed a series of on-line scenarios in which they were asked to rate the level of turbulence associated with 11 simulated weather radar displays. They were also asked to indicate their confidence in being able to continue the flight for 80 nautical miles in the absence of an alteration in track or altitude. A cluster analysis reliably differentiated two groups of participants and these groups corresponded to differences in the capacity to discriminate between weather radar scenarios. The results also reveal both a lack of reliability in experienced pilots' interpretations of weather radar displays and difficulties associated with classifications of expertise on the basis of experienced-related metrics. At an empirical level, the outcomes have implications for assessments of expertise in domains in which ideal performance is difficult to establish. From an industry perspective, the results reveal important differences in the interpretation of weather radar displays amongst experienced, qualified pilots. This suggests a need for both more effective weather radar design, complemented by more reliable and comprehensive training that focuses on the accurate interpretation of different types of weather radar returns. Relevance to industry: The research highlights the difficulties that pilots face in interpreting weather radar displays accurately and emphasises the need for new designs and more effective training initiatives.