E5402. Dataset Examination by Radiologists Reduces Learning Shortcut Bias by AI
Authors
Rahul Sarkar;
McMaster University
Pavneet Bajwa;
No Affiliation
Ranbir Saraon;
McMaster University
Objective:
Artificial intelligence (AI) systems developed for medical imaging tasks may be susceptible to spurious learning shortcuts, leading to classification based on unintended nonpathologic features and subsequent lack of robust performance in clinical deployment. While computational approaches to reduce shortcut learning have been described, radiologists may be able to readily identify potential sources of bias not apparent to nonradiologist data scientists. In this study, we sought to investigate the impact of radiologist examination of imaging datasets in identifying and reducing the impact of learning shortcuts in detection of COVID-19 pneumonia in chest radiographs during the early pandemic.
Materials and Methods:
A literature review was performed to identify relevant studies performed early in the pandemic (prior to December 2020). Studies with datasets that were not publicly available or had been updated or altered at time of access were excluded. Datasets from included studies were evaluated using representative networks to assess model performance and identify confounding data shortcuts using saliency maps. In instances where learning shortcuts were identified outside of the lung fields by saliency maps, lung segmentation and masking were applied, and model performance was re-evaluated. For each study where shortcuts were identified by saliency maps, samples of the control and COVID-19 datasets underwent independent blind review for validity based on the absence versus presence of possible confounding shortcuts by a group of radiologists and a separate group of nonradiologist data scientists.
Results:
Reproduced experiments using datasets from included studies demonstrated model performance consistent with published results, but studies where saliency maps identified model prediction based on shortcuts external to the lung fields (e.g., differences in laterality markers, age-dependent changes in bone development) showed reduced performance following lung segmentation. Preliminary results demonstrate higher rates of shortcut identification by the radiologist group compared with the data scientist group (p = 0.039, kappa = 0.79). This study is ongoing.
Conclusion:
Preliminary results suggest radiologist examination of imaging dataset samples used in AI model development is useful in identifying potential spurious learning shortcuts, which can inform the need to improve dataset quality and/or utilize additional algorithmic techniques to reduce bias (e.g., segmentation) to achieve more robust model performance. This is of particular relevance in the setting of early model development for applications with small or rapidly evolving datasets, as seen with chest radiography in the early COVID-19 pandemic.