Abstract
Surgical skill assessment (SSA) plays a vital role in medical systems for reducing intraoperative surgical errors and improving clinical outcomes. To ensure objective and efficient SSA, many automatic video-based SSA methods have been developed. In particular, various deep learning methods have been devised recently by utilising CNN or RNN-based networks for various skill assessment tasks (e.g., skill level prediction). While predicting overall skill levels and assessing detailed attribute-based scores are highly correlated, most existing studies deal with these two tasks separately, without fully exploiting different information sources encoded in a dataset. In contrast, we propose a novel end-to-end multitask learning framework to conduct skill level classification and attribute score regression jointly. Specifically, our network incorporates two branches for the two tasks, which share earlier layers for feature extraction and hold different prediction layers for specific targets. The shared feature extractor is optimised under the supervision of both tasks simultaneously, encouraging the model to consider information from different aspects and their relatedness to learn richer and more generalised features. In addition, since not every part of a surgical video contributes to skill assessment equally, we enhance an existing feature extractor I3D with a novel Spatio-Temporal Channel Attention Module to emphasize important features. Experimental results on the public dataset JIGSAWS show that our proposed network outperforms state-of-the-art models on both skill classification and score regression tasks.
Original language | English |
---|---|
Title of host publication | 2020 Digital Image Computing |
Subtitle of host publication | Techniques and Applications, DICTA 2020 |
Place of Publication | Piscataway, NJ |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Number of pages | 8 |
ISBN (Electronic) | 9781728191089 |
ISBN (Print) | 9781728191096 |
DOIs | |
Publication status | Published - 29 Nov 2020 |
Event | 2020 Digital Image Computing: Techniques and Applications, DICTA 2020 - Melbourne, Australia Duration: 29 Nov 2020 → 2 Dec 2020 |
Conference
Conference | 2020 Digital Image Computing: Techniques and Applications, DICTA 2020 |
---|---|
Country/Territory | Australia |
City | Melbourne |
Period | 29/11/20 → 2/12/20 |
Keywords
- attention
- multitask learning
- surgical skill assessment