Multitask learning for video-based surgical skill assessment

Zhiteng Jian, Wenxi Yue, Qiuxia Wu, Wei Li, Zhiyong Wang, Vincent Lam

    Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

    5 Citations (Scopus)


    Surgical skill assessment (SSA) plays a vital role in medical systems for reducing intraoperative surgical errors and improving clinical outcomes. To ensure objective and efficient SSA, many automatic video-based SSA methods have been developed. In particular, various deep learning methods have been devised recently by utilising CNN or RNN-based networks for various skill assessment tasks (e.g., skill level prediction). While predicting overall skill levels and assessing detailed attribute-based scores are highly correlated, most existing studies deal with these two tasks separately, without fully exploiting different information sources encoded in a dataset. In contrast, we propose a novel end-to-end multitask learning framework to conduct skill level classification and attribute score regression jointly. Specifically, our network incorporates two branches for the two tasks, which share earlier layers for feature extraction and hold different prediction layers for specific targets. The shared feature extractor is optimised under the supervision of both tasks simultaneously, encouraging the model to consider information from different aspects and their relatedness to learn richer and more generalised features. In addition, since not every part of a surgical video contributes to skill assessment equally, we enhance an existing feature extractor I3D with a novel Spatio-Temporal Channel Attention Module to emphasize important features. Experimental results on the public dataset JIGSAWS show that our proposed network outperforms state-of-the-art models on both skill classification and score regression tasks.

    Original languageEnglish
    Title of host publication2020 Digital Image Computing
    Subtitle of host publicationTechniques and Applications, DICTA 2020
    Place of PublicationPiscataway, NJ
    PublisherInstitute of Electrical and Electronics Engineers (IEEE)
    Number of pages8
    ISBN (Electronic)9781728191089
    ISBN (Print)9781728191096
    Publication statusPublished - 29 Nov 2020
    Event2020 Digital Image Computing: Techniques and Applications, DICTA 2020 - Melbourne, Australia
    Duration: 29 Nov 20202 Dec 2020


    Conference2020 Digital Image Computing: Techniques and Applications, DICTA 2020


    • attention
    • multitask learning
    • surgical skill assessment


    Dive into the research topics of 'Multitask learning for video-based surgical skill assessment'. Together they form a unique fingerprint.

    Cite this