## Abstract

Randomized controlled trials are increasingly examining preference data on the minimum benefit required to make treatment worthwhile. These studies assess patients' time and probability trade-offs as outcomes. A comparison is required between two treatment groups, as randomised in the trial. Such data often has specific features making use of standard tests of group differences inappropriate. These include: • a 'zero spike', a substantial number of patients judging a trivial survival benefit sufficient; • some patients judging the maximum possible benefit insufficient to make treatment worthwhile; • a tipping point, i.e. a minimum survival benefit, specific to the patient, making treatment worthwhile, and which determines the patient's elicited response; • grouping of minimum benefit judged sufficient, with outcomes limited to the specific categories (such as "3 months") offered patients in structured interviews. Several alternative two-sample tests have been used with such data, but all of them potentially return invalid P-values because of these features, incompatible with test assumptions. We report on simulation modelling of the adequacy of bootstrap correction in estimating P-values and power of the alternative procedures. In simulation studies we demonstrate that underlying continuous latent variable, ordinal discrete survival, and mixture distribution models can provide the required comparisons. Corresponding test approaches, parametric and non-parametric, are described. These tests may differ in bias and power. Under models with latent variable determining preference, our findings are of little bias in nominal P-values of parametric and rank tests considered. Substantial power differences are demonstrated. The superior choice of test is found to depend on the form of model alternative considered. Insight into appropriate choice of test is gained from consideration of location-shift and polarised alternatives. In particular, the commonly used normal scores and Wilcoxon-Mann-Whitney tests share good performance under translation shift alternatives. However, these tests exhibit poor power in a sample where responses are drawn from two distinct distributions. In such heterogeneous samples, permutation t-test and logrank tests exhibit higher power.

Original language | English |
---|---|

Title of host publication | 18th World IMACS Congress and MODSIM09 International Congress on Modelling and Simulation: Interfacing Modelling and Simulation with Mathematical and Computational Sciences, Proceedings |

Editors | R.S. Anderssen, R.D. Braddock, L.T.H. Newham |

Place of Publication | Christchurch, NZ |

Publisher | Modelling & Simulation Society Australia & New Zealand |

Pages | 143-149 |

Number of pages | 7 |

ISBN (Print) | 9780975840078 |

Publication status | Published - 2009 |

Externally published | Yes |

Event | 18th World IMACS Congress and International Congress on Modelling and Simulation: Interfacing Modelling and Simulation with Mathematical and Computational Sciences, MODSIM09 - Cairns, QLD, Australia Duration: 13 Jul 2009 → 17 Jul 2009 |

### Other

Other | 18th World IMACS Congress and International Congress on Modelling and Simulation: Interfacing Modelling and Simulation with Mathematical and Computational Sciences, MODSIM09 |
---|---|

Country | Australia |

City | Cairns, QLD |

Period | 13/07/09 → 17/07/09 |

## Keywords

- Bias
- Binned counts
- Grouping
- Ordinal data
- Patient preference data
- Permutation test
- Power
- Rank test
- Time trade-off
- Wilcoxon-Mann-Whitney test