Background

Medical information can be hard to understand and feel overwhelming because of its dense technical language. Simplifying it could help make important medical insights accessible to a much wider audience as soon as they're available.

In this blog post, I'll cover the main methods and pipelines I developed to solve this problem using LLMs, and how they work. I'll spare the full technical details, as it won't fit in the a bitesize style blog. Overall this post is meant to be a preview, and my full disertation will remain private.

This is the first part of coverage on my dissertation, which will focus on approaching text simplification in a programmatic and formal way - using a set of simplification discrete operations/edits. The next part will focus on how LLMs can be treated as agents to edit text in a more natural and conversational way, as well as tackling factual consistency issues.

Why Automate Simplifying Medical Text?

There's a growing need for medical information that's both easy to understand and reliable.

The most important reason to automate this process is to save medical professionals time and effort, resulting in faster releases of novel knowledge from new medical research.

💡 Solution: Use LLMs to automatically simplify medical text, ensuring facts are preserved.

There's also a lot of data on the poor availability of high quality medical information:

For example, about 1 in 3 adults turn to the internet to figure out medical conditions, according to this survey.
There's too much medical misinformation online, and it's getting worse, as shown here.
On top of that, nearly half of adults struggle with health literacy, as shown in this study.

The Cochrane Dataset

One of the most valuable datasets for this task is the Cochrane Dataset, which contains 4,458 entries of systematic reviews in healthcare. Each entry includes:

Medical Abstract (ABS): The original, complex text summarizing a review.
Plain Language Summary (PLS): A simplified version written by expert authors.

Both the original and simplified texts contain multiple sentences. Therefore this problem is formally named paragraph-level text simplification instead of sentence-level. More on the impliications of this later.

Example Simplification

Here is an example entry from the dataset:

Medical Abstract - complex

A review of 24 RCTs with 2147 participants found no significant benefit of dopamine agonists over placebo for dropout, abstinence, dependence severity, or adverse events. Two small studies suggested antidepressants may outperform amantadine for abstinence, but evidence quality was low. Overall, current RCT data do not support dopamine agonists for cocaine misuse treatment.
Plain Language Summary - simplified

We reviewed 24 studies with 2147 adults addicted to cocaine and found no benefit of dopamine agonists over placebo for quitting, staying in treatment, or side effects. Antidepressants may work better than amantadine, but the evidence is weak. Overall, there'’'s no good evidence to support using dopamine agonists to treat cocaine addiction.

I will show you the different methods I developed for achieving a high quality simplifications with LLMs, ensuring medical correctness.

The Task

The goal is to build an automated system that can:

Simplify the language - makes the text easier to read and understand.
Preserve accuracy - remain faithful to the source.
Is designed to be domain-agnostic - universally applicable with no additional changes.

What's Wrong With Current Approaches?

Current solutions rely on smaller, less capable pre-trained language models like BERT. These models have some major limitations:

They don’t give you control over the simplification process - what gets removed, kept, or added is unpredictable.
Sometimes they repeat sentences from the original text.
They’re often trained on specific datasets for narrow domains, which limits their flexibility.

Another big issue is factual consistency. These models can:

Add unsupported information or leave out critical details.
Produce simplified medical text that’s inaccurate, which can be harmful.

Model	What It Does
BART-UL	A transformer model that uses a special loss function to avoid jargon (Devaraj et al., 2022).
TESLEA	Uses reinforcement learning to balance relevance, readability, and simplicity (Phatak et al., 2022).
NapSS	First summarizes the text, then simplifies it using a transformer model (Lu et al., 2023).
UL+Decoder	Improves on unlikelihood loss and ranks sentences by readability using beam search (Flores et al., 2023).

Existing competitive models for simplifying medical text.

Evaluation Metrics

The primary metric for evaluating simplification quality is SARI, which measures how well the generated simplified text performs compared to the expert's simplified text:

Deletes complex words or phrases that the reference simplification also removes.
Adds appropriate simplified content that aligns with the reference.

Other Metrics

We also use more naive metrics, to paint a more complete picture. Here is a summary.

Simplification (SARI): Rewards generation of words in PLS and avoidance of ABS-only words.
Readability (FKGL, ARI): Based on word/syllable count per sentence.
Semantic Similarity (BERTScore): Uses embeddings for contextual similarity.
Lexical Similarity (ROUGE, BLEU): Measures word overlap with original text.

Why Use LLMs?

LLMs will allow a more robust, more performant, and widely applicably solution that won't become redundant due to new model architectures. As new LLMs are released, the model's comprehension and intelligence automatically gets better, assuming the prompting strategies are still compatible.

Effective Prompting - LLMs solve complex tasks when prompted well.
Beyond Sentence-Level - There are few studies looking at paragraph-level simplification or domain-specific prompting strategies.
LLMs as Agents - designing the systems in an agentic, modular way an expand the problem-solving capability massively.
Higher Quality & Factual Consistency - Allows controlled edits and factual error detection.
Domain Agnostic - No training data required; generic prompts enable cross-domain use.

Tools, Models, and Hardware

Language - Python – strong LLM ecosystem via transformers.
Models - LLaMA-70B-Chat, GPT-3.5.
Access - High Performance Computing clusters, LLM Web APIs (OpenAI, AnyScale).

Intro to Prompting LLMs for Text Simplification

Prompting is how we communicate with LLMs to guide their behavior. Here are the foundational strategies that improve the quality and control of simplified outputs.

These techniques are applied throughout my entire project.

Zero-Shot Prompting

In zero-shot prompting, we give the model a task without providing any examples. It's simple, but can be hit-or-miss depending on how clear the instruction is.

Simplify the following paragraph for a general audience:
"Dopamine agonists showed no significant benefit compared to placebo in reducing cocaine dependence."

Few-Shot Prompting

Few-shot prompting improves output by including a few examples of what we expect. It helps the model learn the format and style from the prompt itself.

Simplify the following like the examples:
Original: "This meta-analysis included 24 randomized controlled trials."
Simplified: "We looked at 24 reliable studies."

Original: "Adverse events were moderate in severity."
Simplified: "Side effects were not too serious."

Now simplify:
"Dopamine agonists showed no significant benefit compared to placebo in reducing cocaine dependence."

Chain-of-Thought Prompting

Chain-of-thought prompting encourages the model to reason through its steps before producing the final output. This is especially useful for preserving meaning and factual accuracy.

Let's think through the simplification:

Identify complex terms: "dopamine agonists", "placebo", "cocaine dependence".

Replace them with simpler words.

Rewrite the sentence clearly.

Final simplified version:
"Drugs meant to treat addiction didn’t help people with cocaine problems more than fake pills."

Other Prompting Tips

You can also improve results by clearly setting the model's role or audience:

You are a medical writer explaining research to the general public. Rewrite the paragraph in plain language, keeping it accurate and clear.

Being specific about what you want and who the model should act as helps produce more relevant and reliable outputs.

Text Simplification Operations

While LLMs are capable of generating simplified text end-to-end, they often do so without understanding the specific operations that underlie high-quality simplification. To emulate expert behavior in a controlled and interpretable way, it's useful to break simplification down into fine-grained operations - both at the sentence and document-level.

Sentence-Level Operations

These focus on rewriting a sentence in isolation, and represents the most popular subfield of text simplification. This using techniques such as:

Lexical substitution: Replacing complex terms with simpler ones.
Compression: Removing unnecessary words to make sentences concise.
Elaboration: Adding clarifying details to improve understanding.
Numerical rewriting: Turning statistical figures into plain language.
Structural rewriting: Rearranging sentence parts for clarity.

These operations help, but they don't address larger issues like repetition, flow, or overall readability.

Document-Level Simplification

This approach considers the entire paragraph or passage, using higher-level operations to improve cohesion and structure.

The main motivation of these operations is that sentence-level operations do not simplify the broader, overall content, since they are isolated and unaware of the surrounding sentence context.

In the Cochrane dataset, the number of sentences in the PLS often is larger or smaller than the original complex text. Sentence-level operations cannot support these edits.

Sentence splitting: Breaking long, complex sentences into shorter ones.
Joining: Merging short, related sentences to reduce fragmentation.
Deletion: Removing redundant or off-topic content.
Reordering: Placing related ideas closer together for a smoother narrative.

These broader changes are essential for making medical reviews easier to read in context - not just sentence by sentence.

Initial Document-Level Pipeline

This informed the design of my first pipeline, Plan-Execute which selects an action to execute, Few-shot prompting. The chosen action is outputted as a JSON object, which automatically selects a single operation - document/sentence level. These are each implemented as few-shot prompts too.

The main issue is with action selection and its efficiency - the model gets overwhelmed and can't properly decide. Therefore we should make the process more modular and break it down further.

Dataset Analysis and Experimentation

I begun by analysing the type of sentences found in each abstract by prompting an LLM to classify each sentence with zero-shot prompting.

Category	Description
Introductory	Defines the study, number of trials, participant demographic, location, and time-frame
Results	Presents or explains collected data and observed outcomes of the study
Recommendation	Suggestions aimed at patients
Limitation	Constraints and factors that affected research
Conclusion	Summary of the main outcomes
Other	Used when an LLM is uncertain of classification

Sentence Classes provided to the LLM in a zero-shot prompt

This analysis revealed the distribution of sentences by topic, allowing the LLM to become more aware of the content it should keep or remove. For example, "Results" and "Recommendations" are the most important sentences for a casual reader to comprehend. Sentences of topics "Limitation", "Other" will be much less relevant. The LLM can implicitly focus on more important sentences once it has been explicitly labelled in this way.

Structural Simplificaition - Sentence Filtering

We'll focus on how we can extract the most important sentences to keep in the simplified text, applying our knowledge from document-level simplification. This approach, using LLMs, offers a more interpretable and flexible filtering process, compared to existing methods.

`Classify Then Filter` Pipeline

This method applies our classification prompt, and adds an additional step to select k sentences using Chain-of-thought prompting to reason about it.

`Generate and Match Key Points` Pipeline

This method applies inversed logic to the previous classification method. We want to match sentences to a list of important key points, to prune irrelevant detail.

To focus abstracts around their core themes, we first generate a set of concise key points summarizing the full content. These are created using few-shot prompts to guide style and brevity. We then filter out non-essential points with Chain-of-Thought (CoT) prompting, requiring justification for each removal.

Next, each sentence from the abstract is matched to this refined list using a zero-shot prompt. If a sentence cannot be linked to any key point, it is excluded. This selective filtering helps retain only the most relevant content, ensuring the final output reflects the abstract's central ideas.

Sentence-Level Simplification

Most simplification studies rely on zero- or few-shot prompting. I explored reasoning-enhanced prompting techniques — Chain-of-Thought (CoT) and ReAct — to improve sentence simplification by interleaving reasoning with operations like replacement, compression, etc.

`ReSimplify` - Reasoning-Guided Simplification

In this sentence-level pipeline, I applied the ReAct prompt technique to text simplification. But what is ReAct?

It is a technique to prompt a LLM to act in an interleaved fashion, combining reasoning steps with actions, such as executing tools (e.g. a Calculator or search query) or interacting with an API. In ReSimplify, the actions are our sentence-level operations, and reasoning steps allow the model to present a more guided and controlled simplification process, that more directly engages with the complexity issues of the text.

In short:

Use ReAct to guide simplification with thought-action pairs.
Sequence: Observation → Plan → Act → Update State, repeated until no further simplification needed.
Operations: Word Replacement, Compress sentence, Insert words, Elaborate phrase, Rewrite Numerical Jargon.
- These are the specific sentence-level edits it can choose from.

Lets look at an example.

Complex Sentence:
The meta-analysis demonstrated that treatment for CIN did not adversely affect the chances of conception.

Thought 1: Begin by simplifying complex non-medical terms, clarify sentence structure
Action 1: Replacement
Tag:
The meta-analysis demonstrated that treatment for CIN did not adversely affect the chances of conception.
Replacement:
The meta-analysis showed that treating CIN didn’t harm the chances of getting pregnant.

Thought 2: Expand abbreviations.
Action 2: Insertion
Tag:
The meta-analysis showed that treating </ins> CIN didn’t harm the chances of getting pregnant.
Insertion:
The meta-analysis showed that treating cervical intraepithelial neoplasia (CIN) didn’t harm the chances of getting pregnant.

Thought 3: Clarify non-essential medical jargon.
Action 3: Replacement
Tag:
The meta-analysis showed that treating cervical intraepithelial neoplasia (CIN) didn’t harm the chances of getting pregnant.
Replacement:
The research review showed that treating cervical intraepithelial neoplasia (CIN) didn’t harm the chances of getting pregnant.

Thought 4: The sentence does not have any remaining complexity issues.
Final Simplified Sentence:
The research review showed that treating cervical intraepithelial neoplasia (CIN) didn’t harm the chances of getting pregnant.

A much more controlled method compared to Few-shot and especially zero-shot.

Results

The prompt enables an iterative process to be executed by the LLM, allowing a multi-stage simplification to be completed in one LLM inference step.

My evaluation results concluded that ReSimplify outperforms other prompting strategies and competes with state-of-the-art systems - specifically it achieved a high SARI score of 40.02 on the Cochrane dataset. This shows the effectiveness of reasoning-guided sentence simplification with LLMs, and its flexibility. It can easily be applied to other technical domains.

Using ReSimplify for Paragraph-Level Simplification

The ReSimplify method can be applied as a post-processing step once we have applied our document-level operations. I followed this idea in my "Filter-Join-Split-Simplify" pipeline.

This method is inspired by Cripwell et al. (2023), who view document-level simplification as a two-step process:

Plan: Identify which operations to apply to each sentence.
Simplify: Apply those operations to produce a simpler document.

We adapt and extend that approach into the following pipeline named Filter-Join-Split-Simplify:

Filter: Remove redundant or non-essential sentences that don't contribute to the core meaning.
Join: Combine related or short sentences that convey a connected idea, using structure to group joinable sentences and improve flow.
Split: Break down long or compound sentences into shorter, simpler ones.
Simplify: Apply the ReSimplify prompting strategy to rewrite each sentence in plain, clear language.

This method proved effective at removing unimportant information efficiently, and grouping sentences together in a concise way.

What's next

This post covered the motivation, approach, and implementation of using LLMs to medical text simplification. Specifically we used a formalized and programmatic approach that relied on continuously applying discrete operations to the text.

In Part 2, I'll dive into how LLMs can be reframed as agents and used in multi-agent text simplification pipelines, in order to achieve a more natural simplification process. I'll also discuss methods for tackling factual consistency issues.