2858. Natural Language Processing Techniques for Identifying Critical Results on Thyroid Ultrasounds: Towards Automated Patient-Centered Reporting
Authors * Denotes Presenting Author
  1. Chris George; University of Toronto
  2. Nandni Mithia; School of Medicine, University College Dublin
  3. Andrew Brown *; Uiversity of Toronto; Unity Health Division of Vascular and Interventional Radiology,
Radiology reports are complex documents created for medical, legal and administrative purposes. These documents are not optimized for patient comprehension. This study aims to explore the feasibility of using natural language processing (NLP) to identify critical results from thyroid ultrasound studies as a way to support patient-centered reporting and notification practices.

Materials and Methods:
We identified 1777 reports at our institution from January 2021 - December 2021 with the indication of benign or malignant thyroid lesions. These reports were manually annotated by the research team as either a critical or noncritical result. We randomly divided the dataset into a training set (70% of the total dataset), a test set (15% of the total dataset), and a validation set (15% of the total dataset). We developed a series of traditional NLP models, including bag-of-words (BoW) and term frequency-inverse document frequency (TF-IDF), as well as cutting edge Bidirectional Encoder Representations from Transformers (BERT) models (Bio_ClinicalBERT) to perform this binary classification task. Model performance was measured with accuracy, precision, recall and F1 score.

Exploratory data analysis demonstrated a relative class imbalance with noncritical results representing the majority class (77.6%). The reports consisted of a mix of standardized and nonstandardized radiology reports. The results of each model were assessed based on precision, recall, and accuracy. The Bio_ClinicalBERT model [critical - precision (0.98), recall (0.87), F1 (0.92); noncritical - precision (0.88), recall (0.98), F1 (0.93)] outperformed the traditional NLP models; BoW [critical - precision (0.94), recall (0.74), F1 (0.83); noncritical - precision (0.79), recall (0.95), F1 (0.86)] and TF-IDF [critical - precision (0.88), recall (0.78), F1 (0.83); non-critical - precision (0.80), recall (0.89), F1 (0.84)].

Natural language processing can effectively identify critical results from thyroid ultrasounds. This approach has the potential to impact the way patients consume and take action based on the information contained within their radiology reports. Future work will explore patient perceptions of these automated report tools as well as their effect on patient reported outcomes.