E2016. Cross-Check QA: Rapid Detection and Notification System Eliminates Discordance Between Radiologist and Artificial Intelligence-Based Output
UMass Memorial Medical Center
UMass Memorial Medical Center
The objective of this study was to eliminate inadvertent discordance between radiologist interpretation and artificial intelligence-based decision support system (AI DSS) interpretation of CT scans. We implement a natural language processing (NLP)-based “safety net” discrepancy detection and notification system to identify when a radiology report impression is discordant with a high probability alert generated by AI DSS for the detection of intracranial hemorrhage (ICH), cervical spine fracture, or pulmonary embolus (PE) on CT.
Materials and Methods:
This retrospective study included all consecutive adult CT scans performed over a 15-month period from February 27, 2020–May 22, 2021 at our institution; 29,403 noncontrast head CTs (NCCT), 9697 non-contrast cervical spine CTs, and 7902 CT chest pulmonary angiograms (CTPA) were analyzed using an FDA-approved AI DSS (Aidoc, Tel Aviv, Israel) for the detection of ICH, cervical spine fracture, and PE, respectively. AI DSS notifications were available to the interpreting radiologist but not always reviewed. In near real time, the finalized radiologist reports for CT scans that were flagged as high probability positives by AI DSS were analyzed using NLP-based software to assess concordance. On detection of a discrepancy (i.e., NCCT flagged as high probability for ICH by AI DSS but interpreted as normal by the radiologist), an email notification was generated and sent to the interpreting radiologist, division chief, and radiologist leadership on-call. If needed, addenda were placed on radiology reports after being re-reviewed by the interpreting radiologist (and/or division chief/radiologist leadership on-call) based on the cross-check notification.
A total of 4338 NCCTs (15%), 591 cervical spine CTs (6%), and 863 CTPAs (11%) were flagged as positive by AI DSS for ICH, fracture, and PE, respectively. NLP-based software detected discordance between radiologist and AI DSS interpretations in 0.41% (n = 18) of ICH cases, 0.68% (4) cases of cervical fracture, and 1.3% (11) cases of of PE. Among these discrepant cases, 13 of 18 (72%) ICH cases, 1 of 4 (25%) cervical spine fractures, and 7 of 11 (64%) PE cases were determined to be AI DSS true positives following cross-check email notification and re-review by the interpreting radiologist (and/or division chief/radiologist leadership on-call). For the true positives, addenda were placed acknowledging the positive finding and the responsible clinical provider was notified.
In real world use, inadvertent discordance between radiologist and AI DSS CT interpretation occurs in a small number of cases. NLP-based software can be leveraged to rapidly detect these discrepancies, notify the interpreting radiologist, and prevent potentially missed diagnoses.