1081. Multitask Ensembling: A Novel Strategy for Combining Neural Networks for Concurrent Radiographic Diagnosis
Authors * Denotes Presenting Author
  1. Samira Masoudi; University of California San Diego
  2. Melina Hosseiny *; University of California San Diego
  3. Brian Hurt; University of California San Diego
  4. Justin Huynh; University of California San Diego
  5. Kent Hall; University of California San Diego
  6. Andrew Yen; University of California San Diego
  7. Albert Hsiao; University of California San Diego
Pneumonia and pulmonary edema can share similar radiographic features, making it difficult to discriminate between these entities, which have widely different strategies for clinical management. Convolutional neural networks (CNNs) have been separately proposed for localizing pneumonia and grading severity of pulmonary edema, but it is unclear how these may be optimally combined. Herein, we explore the potential of a new ensemble strategy to combine CNNs, with the hope that the combination may exceed the performance of its parts.

Materials and Methods:
We built upon two previously developed CNNs: i) U-Net for heatmap localization of pneumonia and ii) ResNet-152 for inferring NT-proBNP (N-terminal-pro-hormone B natriuretic peptide) from frontal chest radiographs. We devised a new ensemble CNN architecture and trained it with approximately 50k radiographs from 25k patients from public and private sources, that were used to train its component networks. The ensemble and component CNNs were then evaluated using an independent set of 244 radiographs retrospectively obtained from our clinical practice that had serum NT-proBNP measurements within 24 hours. Ground truth for pneumonia included segmentation of its location and confidence (0-100%) in diagnosis, provided by two board-certified radiologists. Ground truth for presence of pulmonary edema was based on an NT-proBNP threshold of 500. Statistical comparisons included intraclass correlation (ICC), Pearson correlation, paired t-test for performance comparisons, and area under the receiver operating characteristic curve (AUROC).

ICC for confidence in pneumonia improved to 0.62 from 0.39 (p < 0.001) with the ensemble CNN over the single-task pneumonia CNN, when compared against the average radiologist ground truth. The ensemble CNN had significantly lower MSE (27e-6 vs. 49e-6, p < 0.001) and higher PSNR (91dB vs. 86dB, p < 0.001), showing greater resemblance to radiologist annotations. AUROC of the ensemble CNN for detection of pneumonia was slightly improved over the single-task CNN (80.13% vs. 78.60%). Pearson correlation for inference of NT-proBNP from frontal chest radiograph also improved to 0.43 with the ensemble CNN from 0.39. AUROC of ensemble CNN for detection of pulmonary edema was similar to the single-task CNN (83.48% vs. 84.12%).

The multiask ensemble CNN more accurately localized pneumonia and predicted serum NT-proBNP on chest radiography than its component single-task CNNs. The ensemble CNN improved its estimate of diagnostic confidence for pneumonia that mirrored radiologist confidence more closely, when trained with pulmonary edema as a confounding entity. Further investigations will be necessary to determine the impact of these algorithms on the interpretive performance of individual readers in clinical practice.