E4979. External Validation and Performance Analysis of a Deep Learning-Based Model for the Detection of Intracranial Hemorrhage
Authors
Ayman Nada;
University of Missouri
Amna Khan;
University of Missouri
Addison Alt;
University of Missouri
Mourad Hamouda;
University of Missouri
Haydi Hassanein;
University of Missouri
Talissa Altes;
University of Missouri
Ayman Gaballah;
UT Southwestern Medical Center
Objective:
Artificial intelligence (AI) systems that can accurately and effectively detect intracranial hemorrhage could be a crucial tool in acute-care settings where a radiologist may not always be available, especially in low-income or rural areas. Moreover, a well-developed AI model could also serve as an educational tool for resident physicians who are developing their detection and diagnostic skills. By alleviating physicians' workload burden, AI radiology applications could greatly reduce physician fatigue and improve patient outcomes. Our goal is to investigate and validate the effectiveness of a commercially available FDA-approved deep learning-based algorithm for detecting intracranial hemorrhage.
Materials and Methods:
This prospective, IRB-approved study included all patients (> 18 years old) who underwent CT imaging from different clinical settings (i.e., emergency room, inpatient, and outpatient). The study aimed to evaluate the performance of a software-based detection system for intracranial hemorrhage on CT scans, compared to the results of four independent neuroradiologists with 5, 7, 15, and 20 years of experience in neuroradiology. We calculated the sensitivity, specificity, and accuracy of the software-based detection system and compared them to the results of the neuroradiologists' reads. Additionally, we investigated the accuracy of the algorithm for detecting subgroups of intracranial hemorrhage. To analyze the data, we used Microsoft Excel and IBM SPSS v28.
Results:
Our study included 5600 patients (2823 [50.41%] women and 2777 [49.59%] men). The mean age at presentation was 57.97 ± 19.66 years (range 18–104 years). The software-based detection system accurately detected 909 (89%) of the positive cases, with 113 (11%) false positive cases. It failed to detect 196 (4%) cases, and accurately identified 4382 (96%) true negative cases. The sensitivity of the software was 88.94% (95% CI 86.86–90.8), and the specificity was 95.72% (95% CI 95.09–96.29). Moreover, the algorithm showed high sensitivity and specificity for detecting subtypes of intracranial hemorrhage, such as intraparenchymal and subarachnoid hemorrhages, among others.
Conclusion:
Our analysis includes a comprehensive evaluation of the model's performance using a new dataset to test its generalizability. We assessed the sensitivity, specificity, and accuracy of the model, as well as its ability to detect different subtypes of intracranial hemorrhage. The results of our analysis will provide valuable insights into the performance and generalizability of the deep learning-based model for the detection of intracranial hemorrhage, which can be useful for improving its clinical applications and advancing the field of AI in radiology.