Multi-modal Generative AI in Healthcare
About GPT-4o, the new multi-modal model recently released by OpenAI, and the potential impact of multi-modal Generative AI on healthcare and life sciences.
There are points in time when you see new technology in action, and you know you are witnessing a pivotal moment, that is no less than a turning point for an industry.
I can still remember feeling that when I first saw the 360-slow-motion cinematic effect used in "The Matrix", also known as "bullet time". I felt that when I first saw GPT almost two years ago. Last week’s OpenAI demo of GPT-4o was another such moment for me.
If you haven’t seen the demo yet, go watch it now. Like, seriously.
The demo was followed by a Microsoft announcement of GPT-4o preview on Azure OpenAI at the //Build developers conference, and a cool demo of it playing Minecraft.
OpenAI shared multiple demos of GPT-4o showing users interacting with AI over natural spoken language, using video, images and text all together as input. Super impressive, jaw dropping at times. A new era of Conversational AI, if you ask me.
Not a word here about the OpenAI / ScarJo mess. Keep reading, people, nothing to see on this one, move along.
The Multimodal Nature of Healthcare
Healthcare data comes in diverse modalities. There’s a lot of textual data - clinical notes and reports, hospitalization summaries, lab reports, and more. Then there's medical imaging: X-Ray images, MRIs, and CT scans. Videos bring another layer, capturing things like the rhythm of a beating heart during an echocardiogram, videos from tiny cameras entering the human body over procedures like colonoscopy, or even a video of the surgeon’s moves in the operating room. There are signals coming from medical devices, genomic sequencing data, and there’s structured data. And let's not forget sound – human speech during medical encounters, beeps of a heart monitor or even the whoosh of a Doppler ultrasound catching a baby’s heartbeat. Each modality offers a unique lens into the human body, and together, they create a comprehensive picture.
Healthcare is apparently one of the industries producing the most data. A single patient generates nearly 80 megabytes of data each year, across modalities. This data could be leveraged by AI to provide better healthcare and improve services and outcomes, but still, over 90% of it is not. Or at least, not yet.
Multi-modal Generative AI Models
Multi-modal Generative AI models are designed to handle multiple types of data as input, and potentially also as output. Unlike traditional models that focus on a single data type, multimodal models can process and generate text, images, audio, and even video, integrating these diverse streams of information into a coherent output. This versatile capability allows the model to understand the context in a more comprehensive way, draw richer insights, potentially getting to better conclusions and providing more holistic solutions across various applications.
In healthcare, this could mean creating a more complete and nuanced understanding of patient data, potentially leading to better diagnostics, personalized treatments, and ultimately, improved patient outcomes.
Potential Use Cases
Thinking of the potential use cases is nothing less than mind blowing for me. Here are some immediate examples that come to mind:
Think of an operating room AI that not only reads and understands prior medical reports of the patient, but also analyzes medical images, watches a surgical procedure, listens to the conversation in the operating room and interacts with humans over speech.
Or think of an application that combines patient data across images, text, and genetic info to suggest personalized treatment plans to the clinician.
Or an interactive, empathetic AI assistants for patient support, combining text, video and speech understanding for better telemedicine and remote care.
Or a training simulator for medical education that combines text, images, and videos for more effective learning of future clinicians.
Or an application for drug discovery, that utilizes chemical structures, biological data, and information from research papers to accelerate the discovery and development of drugs.
Exciting, right? And those are just a few examples.
What does this all mean?
These use cases illustrate the powerful potential of multi-modal Generative AI to revolutionize healthcare and life sciences in many use cases, making healthcare more personalized, efficient, and accessible.
But with this new era of multi-modal generative models, comes also a new era of Responsible AI, that will need to take those multiple modalities into account, input and output alike.
In other words - as Spiderman’s uncle said – with great power comes great responsibility.
About Verge of Singularity.
About me: Real person. Opinions are my own. Blog posts are not generated by AI.
See more here.
LinkedIn: https://www.linkedin.com/in/hadas-bitran/
X: @hadasbitran
Instagram: @hadasbitran
Recent posts: