2024 ARRS ANNUAL MEETING - ABSTRACTS

RETURN TO ABSTRACT LISTING


E5369. Artificial Intelligence Tools: Text-to-Image Generation
Authors
  1. Ahmed Abdelmonem; George Washington University Hospital
  2. Mary Heekin; George Washington University Hospital; George Washington University School of Medicine
  3. Oleksiy Melnyk; George Washington University Hospital; George Washington University School of Medicine
  4. Ahmed Ismail; George Washington University Hospital; George Washington University School of Medicine
  5. Theodore Kim; George Washington University Hospital; George Washington University School of Medicine
  6. Nima Ghorashi; George Washington University Hospital; George Washington University School of Medicine
  7. Ramin Javan; George Washington University Hospital
Background
Text-to-image is a rapidly advancing artificial intelligence (AI) capability with various image generators. Early models use “generative adversarial networks (GANs),” where two neural networks compete until the output is mistaken for real data. The generator and discriminator improve image quality as they train. Another process is “diffusion,” used by models like OpenAI’s DALL-E 2, Midjourney, Stable Diffusion, and Google’s Imagen. It generates an initial image and subtracts noise through repeated sampling of the image until the target is reached. Diffusion is slow but produces high-quality results. Consistency is another generation process that aims to reduce time and computation power. It generates an initial image with varying noise and notably, translates it to data in one step. This sacrifices image quality for efficiency, but multistep sampling can improve output. OpenAI has explored this approach and released its code on Github. The newest model, Meta’s CM3leon, uses a transformer-based autoregressive architecture that decodes text to generate an image. It employs attention to assess inputted data for relevance and supervised fine-tuning, a key aspect of training for LLMs, to perform multitask instruction. These features enable CM3leon to train with smaller datasets and use less computing power to generate high-quality images and perform other image-related tasks.

Educational Goals / Teaching Points
Explore the rapid evolution of text-to-image generation methods, including GANs, diffusion, consistency, and transformer architectures. Detailed instructions on how to develop effective prompts for creating desired images, especially on Midjourney and DALL-E 2. Topics on Midjourney include discussion on parameters, styles, settings, contexts, blending of images, and using images as input.

Key Anatomic/Physiologic Issues and Imaging Findings/Techniques
Multiple categories of potential applications are discussed, including enhancing baseline 3D visualization images in radiology, creation of images that represent particular clinical symptoms, development of images for patient education, consenting material and trainee education, and application of art in medical writing, Example images are provided with detailed instructions on how they were created.

Conclusion
Text-to-image generative AI is a rapidly evolving technology, and many current models employ various processes for image generation. These advanced tools have potential applications in different aspects of the radiology field, including education, research, and in the clinical realm. The current models have not necessarily been trained on anatomically correct images or tailored for medical applications. Therefore, these capabilities need to be further explored and fine-tuned for dedicated medical and radiologic purposes.