2355. Biopsy Outcome Prediction in Contrast-Enhanced Mammography: Machine Learning of Radiologists’ Interpretation vs. Deep Learning Model
Authors * Denotes Presenting Author
  1. Chang Liu; University of Pittsburgh Medical Center
  2. Priya Patel *; University of Pittsburgh Medical Center
  3. Margarita Zuley; University of Pittsburgh Medical Center
  4. Dooman Arefan; University of Pittsburgh Medical Center
  5. Shandong Wu; University of Pittsburgh Medical Center
Contrast-enhanced mammography (CEM) is clinically interpreted using primarily qualitative descriptors guided by the recently issued BI-RADS supplement for CEM. The value of quantitative analysis in CEM has not been established. We aim to develop quantitative analysis utilizing deep learning and compare this to a machine learning model of the qualitative descriptors used by radiologists for predicting malignant versus benign biopsy outcome.

Materials and Methods:
Under a HIPAA compliant IRB-approved protocol, 287 consenting patients from a single institution with 327 BI-RADS 4A/4B/4C or 5 breast lesions detected with tomosynthesis and/or ultrasound underwent CEM prior to biopsy. CEM images were acquired with low- and high-energy exposures in the craniocaudal (CC) followed mediolateral oblique (MLO) projections with the side ipsilateral to the index lesion acquired first, then postprocessed dual energy subtraction (DES) images were obtained for each view. Biopsy data showed the entire cohort included 76 cancers (45 IDC, 10 ILC, 16 DCIS, three invasive mammary carcinomas,one LCIS, and one poorly differentiated carcinoma) and 251 benign lesions. Lesion annotation and qualitative clinical descriptions were acquired by a single radiologist. Machine learning based on qualitative descriptors and deep learning based on annotated CEM images were used to predict the biopsy outcome. For qualitative clinical descriptors, background parenchymal enhancement (BPE), lesion strength of enhancement, and lesion kinetics were used to build a machine learning model. For quantitative deep learning analysis, multiple patches of bounding boxes on the annotated CEM lesion were used to train a neural network. Quantitative percentage of BPE over the breast was calculated using an automatic pipeline conceptually similar to a breast MRI. The AUC and PPV were used as evaluation metrics.

The AUC of the deep learning model without inclusion of quantitative BPE is 0.70 (precision=83%) on the MLO DES images, and 0.67 (precision = 80%) on the CC DES images. The precision is substantially higher than well-established data of the BI-RADS-based PPV3, i.e., 20-40%. Overall, deep learning shows higher AUCs on DES than on the low energy images, for both CC and MLO views. When evaluating qualitative versus quantitative BPE alone, the respective AUC is 0.70 (std = 0.09) versus 0.69 (std = 0.09). With or without qualitative BPE included, the machine learning model of qualitative descriptors shows an AUC of 0.80 or 0.70, respectively.

Our results demonstrate that deep learning shows a higher predictive value than radiologist interpretation for malignancy versus benignity of lesions interpreted as BI-RADS 4A or higher. The qualitative CEM descriptors used by radiologists can also be modeled by machine learning, showing comparable effects, although they are less reproducible. BPE appears to be a predictive marker, but the effects of quantitative BPE merit further evaluation along with the deep learning models.