TY - JOUR
T1 - Validation of deep learning techniques for quality augmentation in diffusion MRI for clinical studies
AU - Aja-Fernández, Santiago
AU - Martín-Martín, Carmen
AU - Planchuelo-Gómez, Álvaro
AU - Faiyaz, Abrar
AU - Uddin, Md Nasir
AU - Schifitto, Giovanni
AU - Tiwari, Abhishek
AU - Shigwan, Saurabh J.
AU - Kumar Singh, Rajeev
AU - Zheng, Tianshu
AU - Cao, Zuozhen
AU - Wu, Dan
AU - Blumberg, Stefano B.
AU - Sen, Snigdha
AU - Goodwin-Allcock, Tobias
AU - Slator, Paddy J.
AU - Yigit Avci, Mehmet
AU - Li, Zihan
AU - Bilgic, Berkin
AU - Tian, Qiyuan
AU - Wang, Xinyi
AU - Tang, Zihao
AU - Cabezas, Mariano
AU - Rauland, Amelie
AU - Merhof, Dorit
AU - Manzano Maria, Renata
AU - Campos, Vinícius Paraníba
AU - Santini, Tales
AU - da Costa Vieira, Marcelo Andrade
AU - HashemizadehKolowri, Seyyed Kazem
AU - DiBella, Edward
AU - Peng, Chenxu
AU - Shen, Zhimin
AU - Chen, Zan
AU - Ullah, Irfan
AU - Mani, Merry
AU - Abdolmotalleby, Hesam
AU - Eckstrom, Samuel
AU - Baete, Steven H.
AU - Filipiak, Patryk
AU - Dong, Tanxin
AU - Fan, Qiuyun
AU - de Luis-García, Rodrigo
AU - Tristán-Vega, Antonio
AU - Pieciak, Tomasz
N1 - Copyright the Author(s) 2023. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.
PY - 2023/1
Y1 - 2023/1
N2 - The objective of this study is to evaluate the efficacy of deep learning (DL) techniques in improving the quality of diffusion MRI (dMRI) data in clinical applications. The study aims to determine whether the use of artificial intelligence (AI) methods in medical images may result in the loss of critical clinical information and/or the appearance of false information. To assess this, the focus was on the angular resolution of dMRI and a clinical trial was conducted on migraine, specifically between episodic and chronic migraine patients. The number of gradient directions had an impact on white matter analysis results, with statistically significant differences between groups being drastically reduced when using 21 gradient directions instead of the original 61. Fourteen teams from different institutions were tasked to use DL to enhance three diffusion metrics (FA, AD and MD) calculated from data acquired with 21 gradient directions and a b-value of 1000 s/mm2. The goal was to produce results that were comparable to those calculated from 61 gradient directions. The results were evaluated using both standard image quality metrics and Tract-Based Spatial Statistics (TBSS) to compare episodic and chronic migraine patients. The study results suggest that while most DL techniques improved the ability to detect statistical differences between groups, they also led to an increase in false positive. The results showed that there was a constant growth rate of false positives linearly proportional to the new true positives, which highlights the risk of generalization of AI-based tasks when assessing diverse clinical cohorts and training using data from a single group. The methods also showed divergent performance when replicating the original distribution of the data and some exhibited significant bias. In conclusion, extreme caution should be exercised when using AI methods for harmonization or synthesis in clinical studies when processing heterogeneous data in clinical studies, as important information may be altered, even when global metrics such as structural similarity or peak signal-to-noise ratio appear to suggest otherwise.
AB - The objective of this study is to evaluate the efficacy of deep learning (DL) techniques in improving the quality of diffusion MRI (dMRI) data in clinical applications. The study aims to determine whether the use of artificial intelligence (AI) methods in medical images may result in the loss of critical clinical information and/or the appearance of false information. To assess this, the focus was on the angular resolution of dMRI and a clinical trial was conducted on migraine, specifically between episodic and chronic migraine patients. The number of gradient directions had an impact on white matter analysis results, with statistically significant differences between groups being drastically reduced when using 21 gradient directions instead of the original 61. Fourteen teams from different institutions were tasked to use DL to enhance three diffusion metrics (FA, AD and MD) calculated from data acquired with 21 gradient directions and a b-value of 1000 s/mm2. The goal was to produce results that were comparable to those calculated from 61 gradient directions. The results were evaluated using both standard image quality metrics and Tract-Based Spatial Statistics (TBSS) to compare episodic and chronic migraine patients. The study results suggest that while most DL techniques improved the ability to detect statistical differences between groups, they also led to an increase in false positive. The results showed that there was a constant growth rate of false positives linearly proportional to the new true positives, which highlights the risk of generalization of AI-based tasks when assessing diverse clinical cohorts and training using data from a single group. The methods also showed divergent performance when replicating the original distribution of the data and some exhibited significant bias. In conclusion, extreme caution should be exercised when using AI methods for harmonization or synthesis in clinical studies when processing heterogeneous data in clinical studies, as important information may be altered, even when global metrics such as structural similarity or peak signal-to-noise ratio appear to suggest otherwise.
KW - angular resolution
KW - artificial intelligence
KW - deep learning
KW - diffusion MRI
KW - diffusion tensor
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85167438621&partnerID=8YFLogxK
U2 - 10.1016/j.nicl.2023.103483
DO - 10.1016/j.nicl.2023.103483
M3 - Article
C2 - 37572514
AN - SCOPUS:85167438621
SN - 2213-1582
VL - 39
SP - 1
EP - 17
JO - NeuroImage: Clinical
JF - NeuroImage: Clinical
M1 - 103483
ER -