An RLM Implementation For Ollama

Jan 27, 2026

On a recent post I highlighted the RLM MIT paper. RLMs are a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. The goal is to handle long contexts while maintaining quality through chunking, recursive processing, and synthesis.

Roman Gelembjuk made a comment in the original post as to whether this would work locally with Ollama, which had me going down a rabbit hole of what this would look like and what the benefits would be.

I ended up coming with, an implementation that attempts to honor the spirit of the RLM paper for a local Ollama model implementation.

This implementation:

✅ Is able to parse PDF and Docx files directly from the single page browser implementation, offline.

✅ Uses the model to analyze the objective and makes recommendations if it is too narrow
✅ Break context into manageable pieces using chunking
✅ Each chunk gets its own LLM call with fresh context (in this case not to avoid context rot but because the context window in local LLM's is much smaller)
✅ Extraction Prompt with fact assertion + verified / inferred behavior (the extraction prompt is editable)
✅ Synthesis Prompt combines all extracted facts
✅ Confidence rating at the end

The REPL and code execution in the MIT paper are ultimately implementation mechanisms, not the core insight which is "don't feed the whole thing to the model at once. Let it process pieces independently, then combine."

This implementation use a fact assertion pattern to ask the model:

➡️ Assert what it found
➡️ Qualify its confidence
➡️ Distinguish between direct evidence and inference

This is a form of reasoning and verification which is not as dynamic as code execution (but it's there).

From the paper:

"Even without sub-calling capabilities, our ablation of the RLM is able to scale beyond the context limit of the model, and outperform the base model and other task-agnostic baselines on most long context settings."

This approach is analogous to this (at least in spirit).

All state is stored in the browser so everything is client side.

What could it be good for ? When you want to use local models for these type of AIUseCases.

➡️ Tender analysis: Identify bid rejection conditions
➡️ Contract review: Find liability clauses and obligations
➡️ Research synthesis: Extract methodology and findings
➡️ Compliance audit: Identify GDPR violations

The App can be run directly (remember to set ollama_origins so the app can see the local ollama models).

Github repo 👉 https://github.com/jimliddle/Ollama-RLM-Analyzer/tree/main

The animated demo below shows an analysis of a sample tender ( downloaded from the web (using the Gemma3:7B model).

Why AI Man

Discussion about this post

Ready for more?