5457. Leveraging Advanced Language Models for Language-Agnostic Structuring and Automatic Annotation of Radiology Reports: A Rib Fracture Study
Authors * Denotes Presenting Author
  1. Nitai Bar *; Rambam Healthcare Campus
  2. Anat Illivitzki; Rambam Healthcare Campus
  3. Eyal Bercovich; Rambam Healthcare Campus
Radiology reports offer a wealth of expert-level pathological annotations, albeit usually unstructured. This study evaluates the Turbo GPT 3.5 and 4 models' capacity to transform free-text Hebrew radiology reports into language-agnostic structured data, automatically annotate reports for acute findings, recommend further management, and expedite data preparation for artificial intelligence (AI) development in medical imaging. Our focus is on rib fractures to serve as a proof-of-concept.

Materials and Methods:
Our dataset comprised 8317 anonymized trauma protocol CTA reports from 1/1/2012 – 1/4/2022, manually annotated for acute rib fractures. Unlike typical NLP models that rely on specific classifiers, the GPT models were employed to interpret linguistic variations flexibly and to extract meaningful features. Their task was to predict the same labels from the reports. Performance evaluation involved mean AUC, F1, and exact match scores.

Acute rib fractures were identified in 2076 (25%) reports. Preliminary findings suggest exact match scores > 85% and F1 scores > 90%, showing the model's proficiency in auto-annotating all labels, including rarer ones. The GPT models thus allow systematic generation of structured radiology reports, which facilitate immediate notification of acute findings, recommendations for further management, and rapid data preparation for AI applications in medical imaging.

Advanced language models demonstrate remarkable potential in structuring and annotating radiology reports language-agnostically, accelerating data readiness for AI development, and monitoring AI model drift by generating finding identification statistics.