E1443. Diagnostic Performance of Artificial Intelligence-Optimized ACR-TIRADS in a Community Practice Setting: A Single Center Experience
  1. Kareem Elfatairy; Yale New Haven Health-Bridgeport Hospital
  2. Pranav Sharma; Yale New Haven Health-Bridgeport Hospital
  3. Mohammed Osman; Yale New Haven Health-Bridgeport Hospital
  4. Ayah Megahed; Yale New Haven Health-Bridgeport Hospital
  5. Anas Bamashmos; Yale New Haven Health-Bridgeport Hospital
  6. Ashkan Behzadi; Yale New Haven Health-Bridgeport Hospital
  7. Steven Cohen; Yale New Haven Health-Bridgeport Hospital
Recently, artificial intelligence has been utilized to propose an optimization of the ACR-TIRADS (AI-TIRADS) (1). However, the proposed new scoring was tested on a dataset from a single academic institution and has not been externally validated. The purpose of this study is to evaluate the diagnostic performance of the proposed AI-TIRADS scoring in comparison to the original ACR-TIRADS in a community hospital setting.

Materials and Methods:
After our institutional review board approval, we performed a retrospective evaluation of 100 thyroid nodules that were sampled by fine needle aspiration (FNA) technique between November 2016 and December 2019. We excluded nodules with indeterminate/inadequate samples. Final study population was 88 nodules in 68 patients. Each nodule was retrospectively assigned ACR-TIRADS as well as AI-TIRADS scores. FNA and/or surgical pathology results were used as references. AI-TIRADS was then compared to ACR-TIRADS in terms of sensitivity and specificity. Reclassification rate by AI-TIRADS and changes in FNA recommendations were also evaluated.

A total of 88 thyroid nodules with prior FNA in 68 subjects were included in this analysis. In fifteen subjects who had thyroidectomy, surgical pathology results were considered as the reference. Median age was 57 years (range 19-83). 59/68 subjects were females (86.8%) and 9/68 were males (13.2%). Median nodules size (based on largest diameter) was 20.5 mm (range 7-55). 78/88 nodules were benign (79.6%) and 10/88 were malignant (20.4%). Based on ACR-TIRADS, there was 1/88 nodule with TR1 (0.32%), 8/88 with TR2 (5.1%), 36 with TR3 (34.3%), 25/88 with TR4(31.8%) and 18/88 with TR5 (28.6%). ACR-TIRADS recommended FNA for 49/88 nodules (55.7%). AI-TIRADS downgraded 23 nodules and upgraded 4 nodules. In the downgraded subset, 4/23 nodules (17.4%) had a change in recommendations from performing FNA to not performing FNA with subsequent missing of one nodule with cancer. In the upgraded subset, 2/4 nodules (50%) had a change in recommendations from not performing FNA to performing FNA, however, with no additional cancer detected. ACR TIRADS vs AI TIRADS sensitivity (70% [95% CI 34.8-93.3%] vs 60% [95% CI 26.2 – 87.8]) and specificity (46.2% [95% CI 34.7-57.8%] vs 47.4% [95% CI 36 – 59]) with no statistically significant differences, respectively.

Artificial intelligence-optimized TIRADS (AI-TIRADS) simplified the original ACR-TIRADS with no significant impact on diagnostic performance. Prospective studies on a larger population are warranted for further refinement.