1920. External Validation of a Commercial Artificial Intelligence Algorithm on a Large Diverse Population for Detection of Interval Cancers
Authors * Denotes Presenting Author
  1. Steven Plimpton; University of California Los Angeles
  2. Hannah Milch; University of California Los Angeles
  3. Christopher Sears; University of California Los Angeles
  4. James Chalfant; University of California Los Angeles
  5. Cheryce Fischer; University of California Los Angeles
  6. William Hsu; University of California Los Angeles
  7. Melissa Joines *; University of California Los Angeles
Several breast artificial intelligence (AI) programs with regulatory approval are available commercially, promising to increase cancer detection rates. However, whether AI can consistently deliver these benefits when applied to diverse populations is largely underexplored. Previous studies have demonstrated the potential of AI in detecting interval cancers (IC) but were largely performed using enriched data sets. This study examines the characteristics of IC that were detected by an AI algorithm using data from a real-world population.

Materials and Methods:
A subset of digital 2D screening mammograms acquired between December 2010 and October 2015 at our institution was analyzed using an FDA-cleared AI system (Transpara v1.7, ScreenPoint Medical). Screening examinations were assigned a malignancy-risk score of 1-10 by the AI system, with scores of 1-7 considered negative (BI-RADS 1 or 2) and scores of 8-10 reflecting an increased likelihood of malignancy. Descriptive statistics of IC (defined as malignancies diagnosed within 12 months of a negative screening examination) were performed.

A total of 26,702 screening mammograms from 20,409 women (54% White, 10% Hispanic, 9% Asian, 9% mixed, 8% Black, 7% not specified, 2% other, and 0.1% American Indian/Alaskan Native; mean (SD) age 58.1 [11.3 years] and BMI 28.7 [6.6] were analyzed. This population contained a total of 167 malignancies (125 invasive and 42 in situ), including 19 IC and 148 screening-detected cancers. The abnormal interpretation rate (AI score of 10) was 13.7%. A total of 157 malignancies (94%) and 16 IC (84%) (initially missed by the radiologist) were detected by the AI system. Of these AI-detected IC, 14 were invasive and two were in situ (density count of A=1, B=4, C=9, D=2, mean [SD] age 58.1 [13.1] years and BMI 27.2 [7.1]). Invasive IC detected by AI were majority luminal A/B subtype with 12 (75%) ER-positive, 11 (69%) PR-positive, zero HER-2 positive, and 13 (81%) with Ki-67 = 20%. The AI system identified an IC median [SD] of 245 (137) days earlier than radiologist detection.

Increased IC detection in a large, diverse cohort may be possible when AI is used as a radiologist-assist tool. This AI system identified 16 cancers that were initially missed by a radiologist (84% of all IC in the cohort), with a propensity to detect invasive disease, spanning all breast densities in a distribution similar to the general screening population. This AI-generated malignancy-risk score can identify which patients may benefit from additional imaging and clinical evaluation earlier than by radiologist-review alone; however, at the cost of an increased number of cases flagged for review. Large prospective studies are needed to evaluate AI reliability and performance in combination with a radiologist in a real-world clinical setting.