ERS5717. Comparing a Generalized Deep Learning Model to a Traditional Supervised Model for Liver and Spleen Segmentation on Abdomen/Pelvis CT
Authors * Denotes Presenting Author
  1. Clifford Danza *; University of California Irvine
  2. Gillean Cortes; University of California Irvine
  3. Chanon Chantaduly; University of California, Irvine Center for Artificial Intelligence in Diagnostic Medicine
  4. Anthony Wu; University of California Irvine
  5. Erwin Ho; University of California Irvine
  6. Peter Chang; University of California Irvine; University of California, Irvine Center for Artificial Intelligence in Diagnostic Medicine
  7. Roozbeh Houshyar; University of California Irvine
Computed tomography (CT) is commonly used in evaluation and diagnosis of hepatosplenomegaly. However, volume is often estimated using the largest single axis of the organ, a heuristic vulnerable to confounding factors such as patient sex and body size. Machine learning (ML) models offer promise for improved organ measurement. Supervised ML has been shown to have high accuracy at the cost of intense manual labeling and low generalizability. Unsupervised ML may offer benefits for decreased labeling burden while maintaining efficacy, but there is a current gap in the literature on comparing the performance and efficiency of supervised and unsupervised ML models for abdominal organ segmentation. This study aims to compare the performance and time-savings of an unsupervised deep learning algorithm to supervised convolutional neural networks (CNN) specialized for liver and spleen volume segmentation.

Materials and Methods:
A new unsupervised ML model (ATLAS) trained on 14,366 unlabeled abdominal/pelvis CT scans was developed to segment structures in the abdominal cavity. Two 3D-2D U-Net CNNs trained on 500 manually labeled abdominal/pelvis CT scans for liver and spleen, respectively, were developed for comparison. All models were tested on a random 1% subset of 10,000 sequential single-institution abdominal/pelvis CT scans of adults without acute liver or spleen pathology. Ground truth was manually segmented by medical students with radiology resident or attending physician verification. Individuals self-reported time spent on manual annotation of each scan for the test cohort. Dice coefficients were calculated to compare model performance.

The median Dice coefficient for ATLAS and ground truth liver segmentation was 0.930 (interquartile range 0.917 - 0.942), compared to 0.964 (0.959 - 0.967) for CNN and ground truth. The median Dice coefficient for ATLAS and ground truth spleen segmentation was 0.867 (0.829 - 0.890), compared to 0.939 (0.926 - 0.948) for CNN and ground truth. Total time spent segmenting the liver for the test cohort was 1064 minutes (mean 10.64 min/scan). Total time for spleen segmentation on the test cohort was 805 minutes (8.05 min/scan).

The performance of ATLAS was comparable to CNNs specialized for liver and spleen segmentation, demonstrating usefulness of a generalized, unsupervised ML model for organ measurement. Significant time was spent on manual image annotation, with about 31 person-hours required for the combined test set. Extrapolating the measured times for annotation to the 500 training scans annotated for CNN training yielded an estimated 155.75 person-hours, representing a notable relative time-savings of an unsupervised ML model over a supervised model. Comparable performance with extensive time-savings highlights a compelling benefit to the development of unsupervised ML models in radiology, which have an additional advantage of future application to other organ segmentations with minimal additional training.