A long-term memory layer for an agent that cut hallucinations and made conversations feel like the user was being remembered.

Drop in hallucination rate

~70%

4 weeks · 1 PM + 1 engineer

Memory lookup latency

< 200ms

Episodic, semantic, procedural

3 layers

Kick-off to production

4 weeks

“Memory used to be the thing that broke first when we scaled. Now it is the thing users compliment.”

CTO, AI-native SaaS

Before

Where the team was when we picked this up.

The agent forgot users between sessions. New conversations felt like cold starts.
Pulling the entire history into context worked at first then broke as users grew.
No way to tell when memory was the cause of a bad answer versus the model itself.

What we built

Three-layer store

Episodic (what happened, when), semantic (what the user is and cares about) and procedural (how this user likes to be handled). Each written and retrieved differently.

Selective recall

A small retrieval model decides what to pull into context for each turn. Cheap, fast, and trained on the team’s real conversations.

Memory evals

A test set of multi-session conversations with expected recall behaviour. Catches regressions before the next release.

What changed

Users describe the agent as feeling like it knows them.
Costs went down because the agent stops dragging entire transcripts into every prompt.
The team can ship model upgrades without holding their breath.