E2574. Diagnostic Accuracy of Deep Learning-Based T Staging of Colorectal Cancer Using CT Images
  1. Yeo Eun Han; Korea University Anam Hospital
  2. Yongwon Cho; Korea University Anam Hospital
  3. Beom Jin Park; Korea University Anam Hospital
  4. Min Ju Kim; Korea University Anam Hospital
  5. Na Yeon Han; Korea University Anam Hospital
  6. Ki Choon Sim; Korea University Anam Hospital
  7. Deuk Jae Sung; Korea University Anam Hospital
To evaluate the diagnostic accuracy of the T-stage predictive (low or high) deep learning model using CT images in colorectal cancer (CRC).

Materials and Methods:
This retrospective study included 636 consecutive CRC surgery cases between February 2016 and April 2019. Patients who underwent contrast-enhanced abdomen CT in our institution within 30 days preoperatively and with an adenocarcinoma diagnosis were enrolled and randomly allocated to the training and test datasets (n = 535 and n = 101, respectively). Moreover, 94 patients with CRC from other institutions were enrolled for external validation. The T stage of CRC was classified as low or high (Tis, T1, T2 vs. T3, T4). The tumor volume of interest was mapped on portal venous phase images in the training data. A deep learning model based on a 3D convolutional neural network and vision transformer was trained using segmentation and 64 × 64 × 64 voxel bounding boxes of CRC via fivefold cross-validation with 100 epochs; the Adam optimizer was used with a cosine learning rate. The model was then evaluated with a validation dataset. To compare the diagnostic performance between the classifier and humans, one expert abdominal radiologist (11 years of experience), one radiology resident (3 years of experience), and one clinician who does not treat CRC (7 years after obtaining a license) were instructed to stage CRC using the external data. We then evaluated the F1 score of the models and of the human data as well as the accuracy, sensitivity, specificity, and areas under the receiver operating characteristic curve (AUC).

In the internal test data, 79 cases had a high and 22 cases had a low T stage, whereas in the external data, 80 cases had a high and 14 cases had a low T stage. The F1 score of the classifier was 0.87 and 0.90 from internal and external validation, respectively. The accuracy, sensitivity, specificity, and AUC of the classifier were 0.80, 0.92, 0.32, and 0.72 for internal validation, respectively, whereas those for external validation were 0.84, 0.86, 0.71, and 0.78, respectively. The F1 score of the human data was 0.93, 0.89, and 0.78 for the expert abdominal radiologist, radiology resident, and clinician, respectively. The accuracy, sensitivity, specificity, and AUC were 0.88, 0.93, 0.64, and 0.78; 0.83, 0.84, 0.79, and 0.81; and 0.68, 0.65, 0.86, and 0.75 for the data of the expert abdominal radiologist, radiology resident, and clinician, respectively.

The deep learning model predicted the low or high T stage of CRC with a similar F1 score, accuracy, and sensitivity as those shown by the data of the radiology resident. A deep learning model could help clinicians in T-stage prediction of CRC using CT image with performance similar to that of a radiology resident.