This paper reports the results of a simulation study that considers the finite-sample performances of a range of approaches for testing multiple-period predictability between two potentially serially correlated time series. In many empirically relevant situations, but not all, most of the test statistics considered are significantly oversized. In contrast, both an analytical approach proposed in this paper and a bootstrap are found to have accurate empirical sizes. In a small number of cases, the bootstrap is found to have a superior power. The test procedures considered are applied to an empirical analysis of the predictive power of a Phillips curve model during the 'great moderation' period, which illustrates the practical importance of using test statistics with accurate empirical sizes.