1883. Reader Agreement of LI-RADS v2018
Authors * Denotes Presenting Author
  1. Cheng Hong *; University of California, San Diego; University of California, San Francisco
  2. Victoria Chernyak; Memorial Sloan Kettering Cancer Center
  3. Jin Young Choi; Yonsei University
  4. Sonia Lee; University of California, Irvine
  5. Tanya Wolfson; University of California, San Diego
  6. Kathryn Fowler; University of California, San Diego
  7. Claude Sirlin; University of California, San Diego
Prior research evaluating agreement for liver imaging reporting and data system (LI-RADS) has been limited by single-center readers; single-center, single-modality case sets; nonscrollable images or lack of comparison to clinical reads. This study evaluates the interreader agreement for LI-RADS v2018 among international multicenter readers using international multicenter, multimodality case sets and scrollable images, and includes comparison to clinical reads.

Materials and Methods:
This was an international, multicenter reader study of clinical multiphase CT and MRI exams from six institutions and three countries. Deidentified examinations and clinical reports in unique patients with at least one untreated LI-RADS observation were submitted to a central coordinating center, where one untreated observation per examination was randomly selected. The clinically assigned categories and features for that observation were extracted from the corresponding clinical report, a LI-RADS v2018 category was computed from the features, and the observation was electronically annotated. Each annotated examination was uploaded to a cloud-based reading platform and randomly assigned to two of 39 study research readers. The readers independently characterized major and ancillary features and issued a LI-RADS category, blinded to the clinical reads. The primary endpoint was agreement for a modified four-category LI-RADS scale computed using intraclass correlation coefficients (ICCs). Secondary endpoints were binary agreement using ICCs for the following: LI-RADS categories dichotomized as probably or definitely malignant categories vs. not [LR-4/5/M/TIV vs. LR-1/2/3] and LI-RADS categories dichotomized as LR-5 vs. not LR-5. Agreement was computed for research-versus-research reads and for research-versus-clinical reads. ICCs were compared pairwise using nonparametric bootstrap with per-case resampling.

A total of 484 patients (156 women; mean age 62 +- 10 years) with a total of 93 CT exams and 391 MRI exams were included. For four-category LI-RADS scale assignment, reader agreement was significantly higher among research reads only than between research and clinical reads (0.68 vs. 0.62, p = 0.025). In subanalyses, the difference was significant for MRI (ICC: 0.68 vs. 0.61, p = 0.020), but not CT (ICC: 0.68 vs. 0.66, p = 0.66). Agreement for probably or definitely malignant (LR-4/5/M/TIV) was significantly higher among research reads than between research and clinical reads (0.63 vs. 0.53, p = 0.005). Agreement for LR-5 vs. not LR-5 was higher numerically among research reads than between research and clinical reads (0.58 vs. 0.53, p = 0.14), but the difference was not significant.

Agreement for LI-RADS in our international multicenter cohort was moderate overall. Research readers agreed slightly more with each other than with the clinical read for a modified four-category LI-RADS scale, probable/definite malignancy, and for LR-5, suggesting that prior studies assessing reader agreement only in the research setting may have overestimated the clinical performance of LI-RADS.