Context Engineering in Healthcare Agents

About the GPT-5 identity crisis, what is context engineering, and why context is critical in healthcare agents.

Aug 15, 2025

Last Friday, GPT-5 was released, and to no-one’s surprise, triggered a little media circus. I got early access, and, like everyone else, started kicking the tires.

But our initial acquaintance was somewhat confusing. While the ChatGPT app clearly said “ChatGPT5”, the model seemed to be having a small identity crisis. When I explicitly asked if it was GPT-5 or GPT-4.5, it replied with the AI equivalent of “it’s complicated”, saying that it’s “GPT-4.5 in name, GPT-5 in spirit”.

Yeah, bro. I’m tall and blond in spirit.

Looks like the model was missing some context about its own identity.
And in AI, my friends, context is everything.

ChatGPT5 in the first 24 hours after launch, posted on X

For AI systems, Context is the background knowledge, situational awareness, and relevant history that allow the system to respond appropriately.

What do we mean by Context?

For AI systems, Context is the background knowledge, situational awareness, and relevant history that allow the system to respond appropriately. In the universe of agents, context is everything the model sees before it generates an answer – far beyond the text of a single user question. This includes the system instructions, the history of the conversation, short and long-term memory, information retrieved from different sources, what kind of tools the model can run, and more.

In the universe of agents, context is everything the model sees before it generates an answer.

What is Context Engineering then?

Context Engineering seems like the hot new topic in the last month. It is a new fancy term that aims to reflect that working with Generative AI models is beyond just effective prompting, aka Prompt Engineering. Context Engineering is the practice of designing and building systems that provide the AI model with the right tools and the right information, in the right format, and at the right time to complete a task. Few paradigms have surfaced, let’s see how this applies to healthcare.

Context Engineering is designing systems that provide the AI model with the right tools and the right information, in the right format, and at the right time to complete a task.

The Layers of Context in Healthcare Agents

For healthcare agents to work effectively, they need to take into consideration several dimensions of context, some of them are very specific to healthcare:

User Prompt

The user prompt is an obvious part of the context - this the immediate query from the user. A patient might ask: “Can I take ibuprofen after my knee surgery?”. A physician might ask: “show me the relevant guidelines for this case, whether the follow-up should be an MRI or a CT”. This is more than just what the end-user asked for - we need to understand what’s their intent. And maybe they are actually asking for multiple things at the same query, meaning the system is going to need to formalize a plan?... More about Planners in a future post.

System Instructions

System instructions, aka System Prompt, set the mission and provides framing to the agent across interactions, meaning it gives it more context on what it is expected to do, and even more importantly – what it is not allowed to do.

One example for providing context in the system instructions is Inception, where you tell the agent who the end user is (aka Persona), and what that end-user expects in the answers. In a healthcare setting, you may want to tell the agent that it is intended to be used by patients, therefore the agent should use simpler language and ground the answers on patient friendly sources. Your end user may be a clinician, meaning they’d be good with medical jargon and would frown at you if you tell them to consult with their doctor. Read more about it in my previous blog about Inception of Healthcare Agents.

Short-term Memory

Short-term memory refers to the state and history of the current conversation, but also the context of the session. But in healthcare, there’s additional short-term context that is critical. The clinical context.

What is Clinical Context?

Clinical context refers to several additional dimensions in the healthcare setting:

Patient context: Who is the patient we are talking about right now?
Patient priors are an important part of the context. Labs, imaging reports, prior clinical notes can all provide important clinical context.
The patient context is critical, and at the same time, that’s PHI. And in a multi-agent environment, we need to be mindful of where that PHI goes.
Environmental context: Where is the interaction between the agent and the end-user taking place? What’s the setup? Is this happening in an inpatient or an outpatient setup? Hospital or clinic? Is it taking place over speech in a noisy waiting room?
Temporal context: Is the agent expected to provide an answer in real-time, and how long is the end-user willing to wait? If the agent can escalate a conversation with a patient to a human nurse, is the human available 24/7 or should the escalation occur only during business hours?
Operational Context: What are the rules of the clinical workflow? What is the role of the end user, what’s their specialty, what are they permitted to do?

Long-term Memory

Long-term memory refers to information that the agent remembers across interactions. For example, that could be clinical notes templates that are being used by the healthcare organizations, hospital guidelines or standards.

Long term memory can also include the end-user context: like, what is the style and lingo preferences of the specific end user, does the system have prior examples of documents produced by this end-user that they want to system to use while generating new documents, aka few-shot.

And the long term memory would typically also include the expected output: what should be format of the response? Free text? Structured FHIR/OMOP data? An EMR data schema? A template to complete?

Grounding Information

Grounding information would typically include elements of knowledge that provide relevant information for the system to generate answers, i.e., perform RAG on. In healthcare agents, this information would typically come from credible medical sources, organizational guidelines, data from EMR, etc. We want to select the most relevant up-to-date source of information as context to the system, so we can generate a grounded answer - one that we also want to backup with evidence.

Available Tools

Those would be tools that are available for agent to use, like plugins and other agents that can be used by the system. Is the agent also allowed to search the global web? Is the user allowed to choose tools explicitly? Most importantly, which tools are the most suitable right now, and if we need to execute multiple tools, in which order?

As you can see, providing context to agents holds quite a few additional complexities when it comes to healthcare. Nobody said it was easy.

And back to GPT-5

Initial tests show GPT-5 outperforms GPT-4o with its reasoning, and has fewer hallucinations, which is very promising for healthcare use cases. But… the latency. Currently, it takes too long, even when using the “minimal thinking” setting, which makes GPT-5 more suitable for tasks that need deep reasoning rather than speed.

This also highlights the constant need to balance between of latency and accuracy in production systems.
And as a side note, that balance preference, by itself, is also part of the context.

So - time share your thoughts. Here’s a poll for the medical professionals in the crowd: