ARRS 2022 Abstracts

RETURN TO ABSTRACT LISTING


1644. Performance of a Highly-Accurate Prostate Cancer Algorithm on MR Image Data From a Different Institution: Do the Results Hold Up?
Authors * Denotes Presenting Author
  1. Destie Provenzano; George Washington University Hospital
  2. Oleksiy Melnyk *; George Washington University Hospital
  3. Michael Whalen; George Washington University Hospital
  4. Murray Loew; George Washington University Hospital
  5. Shawn Haji-Momenian; George Washington University Hospital
Objective:
This study aims to determine the accuracy of a highly effective prostate-cancer machine learning algorithm, trained and tested on MR image data from a single institution and on image data from a different institution.

Materials and Methods:
The National Cancer Institute (NCI) and the International Society for Optics and Photonics’ (SPIE) “PROSTATEx Challenge” consisted of 344 open-sourced annotated prostate MRI examinations from a single institution (Radboud University Medical Center, Nijmegen, The Netherlands). Imaging was performed on a 3T scanner without an endorectal coil, with 3.6 mm thick T2 and diffusion weighted imaging DWI slices (B = 50, 400, 800). Highly accurate machine learning algorithms were developed for the challenge to predict which MRI-detected lesions identified by radiologists were clinically significant prostate cancer, defined as = Gleason Grade 2 (Gleason 3+4), on MR-guided biopsy; top algorithms reached an accuracy of greater than 95% in published studies from the Challenge using a T2 sequence or ADC maps. A residual neural network (Resnet) algorithm, similar to those used in the PROSTATEx Challenge, was trained and tested on the PROSTATEx image data, achieving comparable results. This algorithm was then tested on 41 prostatectomy patients from our institution with pre-operatively identified MRI lesions. Imaging at our institution was performed on a 3T scanner without an endorectal coil, with 3 mm thick T2 and DWI slices (B = 50, 800, 1400). The algorithm was similarly tested for accuracy using a T2 sequence or ADC map.

Results:
There were 2, 11, 16, 3, and 9 Gleason grade 1, 2, 3, 4, and 5 prostate cancers, respectively, in the 41 prostatectomy patients at our institution. The Resnet algorithm, when trained and tested on image data from the PROSTATEx Challenge, had an accuracy of 96.0% and 93.4% in the classification of clinically significant prostate cancer using T2 sequence and ADC map respectively; this was similar to top performing results obtained in the Challenge. The Resnet algorithm had an accuracy of 12.5% and 60.9% when tested using T2 sequence and ADC map of image data from our institution. When our institution’s patients with Gleason Grade 2 were reclassified to “non-significant cancer” (Gleason Grade 1), the accuracy of the algorithm increased to 50.0% using a T2 sequence and decreased to 50.1% using ADC map.

Conclusion:
Highly-accurate algorithms that are trained and tested on image data from a single institution can have very low-accuracy when tested on image data from a different institution. Differences in image parameters and histologic classification between institutions may account for discordant results. Training algorithms using multi-institutional image data will likely be necessary in the future.