Conservation planners represent many aspects of biodiversity by using surrogates with spatial distributions readily observed or quantified, but tests of their effectiveness have produced varied and conflicting results. We identified four factors likely to have a strong influence on the apparent effectiveness of surrogates: (1) the choice of surrogate;(2) differences among study regions, which might be large and unquantified (3) the test method, that is, how effectiveness is quantified, and (4) the test features that the surrogates are intended to represent. Analysis of an unusually rich dataset enabled us, for the first time, to disentangle these factors and to compare their individual and interacting influences. Using two data-rich regions, we estimated effectiveness using five alternative methods: two forms of incidental representation, two forms of species accumulation index and irreplaceability correlation, to assess the performance of 'forest ecosystems' and 'environmental units' as surrogates for six groups of threatened species-the test features-mammals, birds, reptiles, frogs, plants and all of these combined. Four methods tested the effectiveness of the surrogates by selecting areas for conservation of the surrogates then estimating how effective those areas were at representing test features. One method measured the spatial match between conservation priorities for surrogates and test features. For methods that selected conservation areas, we measured effectiveness using two analytical approaches: (1) when representation targets for the surrogates were achieved (incidental representation), or (2) progressively as areas were selected (species accumulation index). We estimated the spatial correlation of conservation priorities using an index known as summed irreplaceability. In general, the effectiveness of surrogates for our taxa (mostly threatened species) was low, although environmental units tended to be more effective than forest ecosystems. The surrogates were most effective for plants and mammals and least effective for frogs and reptiles. The five testing methods differed in their rankings of effectiveness of the two surrogates in relation to different groups of test features. There were differences between study areas in terms of the effectiveness of surrogates for different test feature groups. Overall, the effectiveness of the surrogates was sensitive to all four factors. This indicates the need for caution in generalizing surrogacy tests.