Investigating rater severity/leniency in interpreter performance testing: a multifaceted Rasch measurement approach

Chao Han

    Research output: Contribution to journalArticlepeer-review

    27 Citations (Scopus)


    Rater-mediated performance assessment (RMPA) is a critical component of interpreter certification testing systems worldwide. Given the acknowledged rater variability in RMPA and the high-stakes nature of certification testing, it is crucial to ensure rater reliability in interpreter certification performance testing (ICPT). However, a review of current ICPT practice indicates that rigorous research on rater reliability is lacking. Against this background, the present study reports on use of multifaceted Rasch measurement (MFRM) to identify the degree of severity/leniency in different raters' assessments of simultaneous interpretations (SIs) by 32 interpreters in an experimental setting. Nine raters specifically trained for the purpose were asked to evaluate four Englishto- Chinese SIs by each of the interpreters, using three 8-point rating scales (information content, fluency, expression). The source texts differed in speed and in the speaker's accent (native vs non-native). Rater-generated scores were then subjected to MFRM analysis, using the FACETS program. The following general trends emerged: 1) homogeneity statistics showed that not all raters were equally severe overall; and 2) bias analyses showed that a relatively large proportion of the raters had significantly biased interactions with the interpreters and the assessment criteria. Implications for practical rating arrangements in ICPT, and for rater training, are discussed.
    Original languageEnglish
    Pages (from-to)255-283
    Number of pages29
    Issue number2
    Publication statusPublished - 2015


    • interpreter certification
    • performance testing
    • multifaceted Rasch measurement
    • rater severity/leniency
    • rater training
    • rater variability


    Dive into the research topics of 'Investigating rater severity/leniency in interpreter performance testing: a multifaceted Rasch measurement approach'. Together they form a unique fingerprint.

    Cite this