E1630. A Deep Learning Algorithm for Triaging Metastatic Spinal Cord Compression on Computed Tomography (CT)
  1. James Hallinan; National University Hospital
  2. Lei Zhu; National University Hospital
  3. Wenqiao Zhang; National University Hospital
  4. Tricia Kuah; National University Hospital
  5. Beng Chin Ooi; National University Hospital
  6. Swee Tian Quek; National University Hospital
  7. Andrew Makmur; National University Hospital
Metastatic spinal cord compression (MSCC) is an oncological emergency. Timely diagnosis and treatment are important to prevent irreversible neurological injury. Prior studies have shown the feasibility of Computed Tomography (CT) imaging in detecting MSCC. This provides an opportunity for earlier detection of MSCC in oncology patients who routinely undergo staging CT scans. This study aims to develop a deep learning (DL) algorithm for detecting and grading the severity of MSCC on CT studies, and to assess its performance using internal and external test datasets.

Materials and Methods:
A retrospective review over a 13-year period retrieved 420 staging CT studies from 225 patients that were eligible for inclusion in the training/validation dataset and internal test dataset. Of these, 354 (84%) were used for training/validation of the DL model and the remaining 66 (16%) were used for internal testing. The external test dataset comprised 43 staging CT studies from 32 patients of a different institution. All these patients had undergone spine MRI within 60 days of the CT studies, which were used by two subspecialized radiologists (11 and 6 years of experience, specializing in musculoskeletal (MSK) imaging and neuroimaging, respectively) to label the images, serving as the reference standard. The internal and external test datasets were then labeled by the developed DL model and four subspecialized radiologists (3 - 7 years of experience, two specializing in MSK imaging and two specializing in body imaging). Inter-rater agreement (Gwet’s kappa) and sensitivity/specificity and AUCs were calculated. The performance of the DL model and the four subspecialized radiologists were also compared to the original radiology reports produced at the time of scanning for studies in the internal test dataset.

There was high inter-rater agreement by the DL model for trichotomous MSCC grading into normal, low- and high-grade MSCC (kappa = 0.872 on the internal test dataset, p < 0.001; kappa = 0.844 on the external test dataset, p < 0.001). The DL algorithm also demonstrated superior inter-rater agreement compared to two radiologists for the internal dataset (one specializing in musculoskeletal imaging, kappa = 0.795, p < 0.001; and the other body imaging, kappa = 0.724, p < 0.001), and compared to one radiologist for the external dataset (specializing in body imaging, kappa = 0.721, p < 0.001). Of note, the DL model and all four radiologists showed superior inter-rater agreement (kappa = 0.603 – 0.849) compared to the original radiology reports (kappa = 0.027).

We developed a DL model for MSCC detection and grading on staging CT studies. This showed comparable to superior inter-rater agreement on both internal and external testing compared to targeted assessment of MSCC by subspecialized radiologists, and superior inter-rater agreement compared to the original radiology reports from the internal test dataset. A DL model could improve and expedite MSCC diagnosis on staging CT studies, allowing earlier treatment for better patient outcomes.