ERS3027. Proposal of a Novel Self-Supervision Task Exploiting Slice Order in CT/MR Data for Machine Learning Robust to Data and Task Transfer
Authors * Denotes Presenting Author
  1. Vineeth Gangaram *; University of Pennsylvania
  2. Spyridon Bakas; University of Pennsylvania
Although Healthcare Artificial Intelligence (HAI) is promising, there are generalizability concerns when translating between other institutions, kernels, and patient populations. Creating high-quality diverse datasets with associated labels is essential for each HAI application. However, creating such labels in radiology is limited by specialized human availability. A solution is to construct a synthetic task that is related enough to the other supervised “real” tasks that a well-performing network on the synthetic task would be able to generalize to the “real” tasks and can opportunistically make use of implicit structure within real-world data to allow for self-supervised labeling with minimal human intervention. The proposed synthetic task is the prediction of which of two frames drawn randomly from an axial Magnetic Resonance (MR)/Computed Tomography (CT) cross-sectional scan is more cranial. To accomplish this task, a model must identify anatomic landmarks (e.g. basal ganglia is more superior to the pons), a skill generalizable across many “real” tasks. To validate that this skill was learned, the model is incorporated into a supervised (human labeled) task of predicting whether a CT slice contains various abdominal organs. Finally, without further training, performance on MR images is used to evaluate whether the self-supervised pre-training allowed this secondary model to generalize to a different data distribution.

Materials and Methods:
Axial MR (42) and CT (1051) data for this study came from the publicly available AMOS and AbdomenCT-1K competition datasets. The network architecture is a twin neural network with a backbone of two ResNet50 networks. Data was augmented with random windowing, center crops, and random erasing. Labels were generated programmatically by comparing frame numbers. For the slice-wise organ detection supervised task, CT slices and human labels from the AMOS dataset were used to train a support vector machine model that took the output of the above ResNet as input. After training on CT data (200), this network was evaluated on CT (100) and MR (10) data.

Performance on the self-supervised frame-ordering task was 99% accuracy on CT and 97% on MR. Performance on the labeled organ presence task was 92/87% spleen (CT/MR), 94/90% kidney, 92/84% gallbladder, 98/96% esophagus, 94/86% liver, 91/87% stomach, 97/98% aorta, 97/95% vena cava, 95/89% pancreas, 96/89% adrenal gland, 94/89% duodenum.

Although this work demonstrates immediate applications for anatomical series identification and sequence registration tasks, the future value of this work is to improve model generalization. By demonstrating that a self-supervised model trained on distributions A (CT) and B (MR) paired with a supervised model trained only on data from A achieves only mild degradation in performance, groundwork is laid for demonstrating similar benefits when A and B are hospitals, kernels, or patients. Because no explicit labels are required for frame ordering, this can be scaled to use any/all existing cross sectional data and be the backbone for many robust models.