Power comparisons of parametric and rank tests: grouped outcomes with zero-spike

H. M. Hudson*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Randomized controlled trials are increasingly examining preference data on the minimum benefit required to make treatment worthwhile. These studies assess patients' time and probability trade-offs as outcomes. A comparison is required between two treatment groups, as randomised in the trial. Such data often has specific features making use of standard tests of group differences inappropriate. These include: • a 'zero spike', a substantial number of patients judging a trivial survival benefit sufficient; • some patients judging the maximum possible benefit insufficient to make treatment worthwhile; • a tipping point, i.e. a minimum survival benefit, specific to the patient, making treatment worthwhile, and which determines the patient's elicited response; • grouping of minimum benefit judged sufficient, with outcomes limited to the specific categories (such as "3 months") offered patients in structured interviews. Several alternative two-sample tests have been used with such data, but all of them potentially return invalid P-values because of these features, incompatible with test assumptions. We report on simulation modelling of the adequacy of bootstrap correction in estimating P-values and power of the alternative procedures. In simulation studies we demonstrate that underlying continuous latent variable, ordinal discrete survival, and mixture distribution models can provide the required comparisons. Corresponding test approaches, parametric and non-parametric, are described. These tests may differ in bias and power. Under models with latent variable determining preference, our findings are of little bias in nominal P-values of parametric and rank tests considered. Substantial power differences are demonstrated. The superior choice of test is found to depend on the form of model alternative considered. Insight into appropriate choice of test is gained from consideration of location-shift and polarised alternatives. In particular, the commonly used normal scores and Wilcoxon-Mann-Whitney tests share good performance under translation shift alternatives. However, these tests exhibit poor power in a sample where responses are drawn from two distinct distributions. In such heterogeneous samples, permutation t-test and logrank tests exhibit higher power.

Original languageEnglish
Title of host publication18th World IMACS Congress and MODSIM09 International Congress on Modelling and Simulation: Interfacing Modelling and Simulation with Mathematical and Computational Sciences, Proceedings
EditorsR.S. Anderssen, R.D. Braddock, L.T.H. Newham
Place of PublicationChristchurch, NZ
PublisherModelling & Simulation Society Australia & New Zealand
Pages143-149
Number of pages7
ISBN (Print)9780975840078
Publication statusPublished - 2009
Externally publishedYes
Event18th World IMACS Congress and International Congress on Modelling and Simulation: Interfacing Modelling and Simulation with Mathematical and Computational Sciences, MODSIM09 - Cairns, QLD, Australia
Duration: 13 Jul 200917 Jul 2009

Other

Other18th World IMACS Congress and International Congress on Modelling and Simulation: Interfacing Modelling and Simulation with Mathematical and Computational Sciences, MODSIM09
CountryAustralia
CityCairns, QLD
Period13/07/0917/07/09

Keywords

  • Bias
  • Binned counts
  • Grouping
  • Ordinal data
  • Patient preference data
  • Permutation test
  • Power
  • Rank test
  • Time trade-off
  • Wilcoxon-Mann-Whitney test

Fingerprint

Dive into the research topics of 'Power comparisons of parametric and rank tests: grouped outcomes with zero-spike'. Together they form a unique fingerprint.

Cite this