The AI memory problem is getting worse, not better. That's because intelligence is outgrowing memory capacity. The fastest way to address the problem is by building our own memory systems—here's how!
Nate
Since ChatGPT launched, intelligence has scaled roughly 60,000x, while memory has scaled only 100x.
This means that relative to intelligence, the memory problem has gotten ~2.5 orders of magnitude (or 25x) WORSE since ChatGPT launched. Not better. 25x worse.
This explains why we have invented entire jobs called Context Engineering.
Why there is a hundred billion dollar industry for memory solution vendors.
Why so much of prompting is about what you can put into a chat to remember.
Most of all, why everyone I talk to is trying to figure out desperately how to fit transcripts, notes, documents, into some kind of order so they can use AI with it.
It’s so hard. And we’re not getting help, and that’s why I wrote this piece.
I’m gonna dig in here and explain:
- Why memory is such a difficult problem to solve
- Including 5 root causes that explain why no vendor has solved memory yet
- I have not seen anyone lay these out this clearly before
- Frankly, I think a lot of vendors don’t want to talk about them
- The key insights you need to understand how to build your own memory solutions
- No coding is not needed!
- Yes, I have prompts to help
- The 8 scalable principles I use to construct memory systems
- Yes, they scale from ChatGPT user level to engineering level
- Yes, you can use the same principles to design agentic systems
- No coding is needed—but you CAN use these principles to code quickly
- Again, I haven’t seen anyone lay these out before like this
- Five Prompts to get you started on your own memory solution
- An easy start prompt to help you sketch out a system
- A Memory Architecture Designer prompt
- A Context Library Builder prompt
- A Project Brief Compiler prompt
- A Retrieval Strategy Planner prompt
I wrote these prompts because I wanted to be helpful and I could find ZERO prompts out there to help with designing real memory systems in depth. I think that’s because the problem is so difficult.
So I built them myself, and I’m sharing them here. Dig in, enjoy, and good luck building a memory system for AI!
Subscribers get all these newsletters!
Subscribed
Grab the memory prompts now
These prompts convert unstructured conversations into reusable context that improves AI consistency.
The Memory Architecture Designer defines what information to store, where to store it, and for how long, ensuring critical facts remain accurate and accessible.
The Context Library Builder captures durable preferences, playbooks, and reference materials, eliminating repeated explanations. The Project Brief Compiler transforms scattered inputs into a verified, single source of truth with explicit scope.
The Retrieval Strategy Planner maps which information to surface during planning, execution, and review phases.
Together, these prompts reduce friction, minimize errors, and maintain continuity across sessions and tools—each conversation starts with better context and reaches useful output faster.
(The easy start prompt is at the end of the article)
The Real AI Breakthrough Isn’t Reasoning—It’s Memory
======================================================
Memory is perhaps the biggest unsolved problem in AI, and it’s one of the only problems in AI that is getting worse, not better. As we get better and better at intelligence, we get worse at memory, relatively speaking. In fact, there’s a name for it in the model maker community: the memory wall.
We’re not improving the hardware capabilities of our memory systems nearly as fast as we’re improving the ability of those chips to do inference, to compute, to generate. That creates a growing gap between our intelligence capabilities and our memory capabilities.
Here’s the core tension: AI systems are stateless by design, but useful intelligence requires state.
Every conversation starts from zero. The model has parametric knowledge—the weights we talk about—but it doesn’t have episodic memory. It doesn’t remember what happened with you. You have to reconstruct your context every single time. This isn’t a bug. It’s the architecture. Model makers want the system to be maximally useful at solving the next problem, and they can’t presume that state matters.
The promise of “memory features” is that vendors will magically solve this by making the system stateful in ways that are useful to you. But this creates a whole host of new problems: What should it remember? How long? When do you retrieve it? How do you update it?
These aren’t implementation details. They’re fundamental questions about what memory is and what it’s for when we do work. The gap between what’s promised, what’s delivered, and what’s actually needed has never been bigger.
Why This Matters Now
First, this is getting worse! But behind that, there’s a massive shift in the competitive landscape happening now.
The competitive shift: Memory is moving from “nice to have” to “core competitive advantage” faster than most teams realize. Organizations with intentional memory systems are shipping 2-3x faster—not because their AI is smarter, but because they’ve stopped losing context. Every conversation compounds instead of starting from zero.
What’s actually happening (beyond the hype):
Gemini (Google) now offers million-token context windows and is rumored to be building cross-thread memory retrieval into the product. Expect tighter integration with Google Workspace memory by Q1 2025.
Claude (Anthropic) added chat history search and optional automatic memory for Team/Enterprise tiers. The bet: let users control what persists across conversations.
Cursor (for developers) shipped Plan Mode—a separate retrieval mode for architectural thinking vs. execution. They’ve recognized that memory needs differ by task type, not just by volume.
The pattern: Every major player is racing to solve memory, but they’re solving different slices of it. ChatGPT Memory won’t talk to Claude. Cursor’s memory bank is great—until you need to switch editors. No single vendor has the full solution yet.
And critically, you can’t outsource memory to a vendor, because vendors don’t understand your context well enough to truly solve it (and why should you turn memory over to one vendor?)
Build vs. buy framework:
Use vendor solutions when:
- Your context is general enough to share with the platform
- You work solo or in small teams
- Lock-in risk is acceptable
Build custom memory infrastructure when:
- Context is proprietary and compounds over time (client relationships, domain expertise, system architecture)
- You need audit trails and governance (compliance, security)
- Retrieval needs are complex (mode-aware, multi-source, high precision on facts)
The risk equation: Bad memory = context drift, hallucinated facts, privacy leaks, wasted cycles re-explaining. Good memory = compounding leverage, consistent outputs, faster iteration, portable intelligence.
What’s coming: Expect every major AI tool to ship memory features in the next 6 months. The principles below help you evaluate what’s real versus vapor, and build systems that work across platforms.
Memory Is Two Acts
Ok, as we move forward, this is the foundation everything else builds on:
Act 1: Building your context library. This is what you want the system to remember—your preferences, your standards, project facts, domain knowledge, past decisions. It’s deliberate curation, not passive accumulation.
Act 2: Pulling the right slice. Retrieval that actually works. Not “dump everything into the context window,” but selective, precise fetching of what matters for the current task.
Everything else—embeddings, vector stores, context windows, RAG pipelines—is infrastructure supporting these two acts.
The mistake most people make: they focus on infrastructure (bigger context windows! better embeddings!) without being intentional about what they’re storing or how they’re retrieving it. Result: more context, same problems.
The path forward: understand why the problem persists, then apply principles that actually solve it.
The Problem
Fundamentally, AI systems are bad at memory for reasons we don’t usually talk about. I’m laying these out here because I think it is important to talk about the real reasons why no one has yet shipped an adequate memory solve.
All six of these are separate root causes, but they are interrelated. Think of them as a big nasty ball of twine—the problem space is extremely entangled, and that’s why it’s hard to solve.
1. The relevance problem is unsolved
What’s “relevant” changes based on:
- What task you’re doing (planning vs. executing)
- What phase of work (exploration vs. refinement)
- What scope you’re in (personal vs. project)
- What changed since last time (state deltas)
Semantic similarity is a proxy for relevance, not a solution. “Find similar documents” works until you need “find the document where we decided X” or “ignore everything about Client A right now, but pay attention to clients B, C, and D.”
There’s no general algorithm for relevance. It requires human judgment about task context, which means it requires architecture, not just better embeddings. That’s why one-stop-shop vendors often struggle with real implementations.
2. The persistence/precision tradeoff
If you store everything, retrieval becomes noisy and expensive. You jam up your context window.
If you store selectively, you lose information you’ll need later.
If you let the system decide what to keep, it optimizes for the wrong thing—recency, frequency, statistical salience versus actual importance.
Human memory solves this through the technology of forgetting. We use incredibly lossy compression with emotional and importance weighting. AI systems don’t have that. They either accumulate or they purge, but they don’t naturally decay.
Forgetting is a useful technology for us. AI has nothing like it.
3. The single-context-window assumption
Vendors keep trying to solve memory by making context windows bigger. But volume isn’t the issue—structure is.
A million-token context window full of unsorted context is worse than a 10,000-token window with precisely curated content. The model still has to find what matters, parse relevance, ignore noise. You haven’t solved the problem, you’ve made it more expensive.
I know people who don’t budget their API calls and wonder why their bill is high. It’s because they’re stuffing the context window hoping volume solves precision. It doesn’t.
The real solution requires multiple context streams with different lifecycles and retrieval patterns. But that breaks the simple mental model of “just talk to the AI.”
4. The portability problem
Every vendor builds proprietary memory layers because they think memory is a moat. ChatGPT Memory, Claude’s recall, Cursor’s memory bank—none of these are interoperable.
Users invest time building up memory in one system, then switching costs become real. You can’t port “what ChatGPT knows about me” to Claude. Your memory is locked in.
This discourages users from building proper context libraries because “the tool will handle it.” Then the tool changes, or you switch tools, and you’re starting over.
If you’re building business memory systems, you must solve the portability problem. It’s a liability to be single-model. But vendors aren’t incentivized to make memory truly portable either.
5. The passive accumulation fallacy
Most memory features assume: “Just use the AI normally, and it will figure out what to remember.”
This fails because:
- The system can’t distinguish preference from fact
- It can’t tell project-specific from evergreen context
- It can’t know when old information is stale
- It optimizes for continuity, not correctness (”keep the conversation going”)
Useful memory requires active curation—deciding what to keep, what to update, what to discard. That’s work. Vendors promise passive solutions because active curation doesn’t scale as a product.
But passive accumulation doesn’t solve the problem either. And this costs us billions of dollars at the enterprise level while frustrating users personally and professionally.
6. Memory is actually multiple problems
When people say “AI memory,” they’re conflating:
- Preferences (how I like things done) → key-value, persistent
- Facts (what’s true about entities) → structured, needs updates
- Knowledge (domain expertise) → parametric, embedded in weights
- Episodic (what we discussed) → conversational, temporal, ephemeral
- Procedural (how we solved this before) → exemplars, success traces
Each needs different storage, retrieval, and update patterns. Treating them as one problem guarantees you solve none of them well.
Vendors solve infrastructure, not architecture. Bigger windows, better embeddings, cross-chat search—these are scaling improvements, not structural solutions.
Users expect passive solutions to active problems. “Just remember what matters” requires judgment about what matters. That can’t be fully automated.
Memory requires architecture (deliberate separation, multiple stores, mode-aware retrieval), but vendors sell features (passive accumulation, bigger windows, universal search). That’s the gap. That’s why this persists.
How to Solve: The 8 Principles
Ok enough of the doom and gloom. How do we move forward here? What scalable principles enable us to unlock this problem?
I’m constructing and laying out these principles so that they work whether you’re a power user with ChatGPT or a developer building agentic systems. They’re tool-agnostic, they scale with complexity, and they solve the actual problem.
1. Memory is an architecture, not a feature
You cannot wait for vendors to solve this. Every tool will have memory capabilities, but they’ll solve different slices. You need principles that work across all of them.
This means you architect memory as a standalone system that works across your whole tool set. Vendors give you capabilities. You design the architecture that makes those capabilities useful to you.
The discipline: treat memory as something you build and maintain, not something you passively accumulate through tool usage.
2. Separate by lifecycle, not by convenience
Personal preferences (permanent) ≠ project facts (temporary) ≠ session state (ephemeral).
Mixing these is what breaks memory. Your writing style shouldn’t change when you switch projects. Your client’s name shouldn’t bleed into unrelated work.
The discipline lies in keeping these apart cleanly. At a small scale, this might be as simple as updating your system prompt separately from project briefs. At larger scale, it’s separate datastores with different retention policies.
If you’re designing agentic systems, it’s the same principle: separate permanent facts, project-specific facts, and session state. Don’t mix lifecycle concerns.
3. Match storage to query pattern
You need multiple stores because different questions require different retrieval:
- “What’s my style?” → key-value
- “What’s the exact client ID?” → structured/relational (SQL, tables)
- “What similar work have we done?” → semantic/vector (embeddings)
- “What did we try last time?” → event logs
Trying to do everything with one storage pattern fails predictably. When you need an exact client ID, hitting a vector store is slow and unreliable. When you need “similar past designs,” a key-value store can’t help you.
This is why when people say “we have our data lake and it’s gonna be a RAG,” I ask: why? RAG works for semantic recall. It doesn’t work for exact lookups, for joining data, for filtering on precise conditions.
Match the store to the query.
4. Mode-aware context beats volume
More context is not better context.
Planning conversations need breadth—alternatives, comparables, what’s worked before. You’re exploring solution space.
Execution conversations need precision—exact constraints, current status, no ambiguity.
Review or debugging needs trace history—what did we decide, why, what did we try that failed.
Retrieval strategy must match task type. Tools like Cursor’s Plan Mode recognize this: when you’re planning, it searches broadly across your codebase and past projects. When you’re executing, it pulls precise context for the current file.
You can apply this pattern even without sophisticated tooling: maintain separate context briefs for planning vs. execution. Your planning brief includes comparables, tradeoffs, open questions. Your execution brief includes only canonical facts and current constraints.
5. Build portable, not platform-dependent
Your memory layer should survive vendor changes, tool changes, model changes. If ChatGPT changes pricing, if Claude adds a feature, your context library should be retrievable regardless.
This is something almost nobody can say right now. People doing this well tend to be designing large-scale agentic systems at the enterprise level. But it’s a best practice for everyone.
It’s like keeping a go bag next to the door. You need something portable that carries relevant memory that you can use to have productive conversations with another AI.
Keep your canonical context library external and model-agnostic. You should be able to copy-paste from this library into ChatGPT, Claude, Cursor, whatever you’re using. This is your memory, not the tool’s.
Vendor memory is a convenience layer, not your source of truth.
6. Compression is curation
Do not upload 40 pages hoping the AI extracts what matters.
You need to do the compression work. Either in a separate LLM call or in your own work: write the brief, identify the key facts that matter, state the constraints. This is where judgment lives.
Memory is bound up in how we humans touch the work. There are ways to use AI to amplify and expand your judgment—you can use a precise prompt to extract information in a structured way from 40 pages of data, then in separate work figure out what to do with that data.
But it remains on you to make sure the facts are correct, the constraints are real, and the precision work you’re asking the AI to do with that data is the correct precision work.
The judgment in compression is human judgment. It may be human judgment you amplify with AI, but it remains human judgment.
7. Retrieval needs verification
Semantic search recalls well but fails on specifics. It will recall topics and themes well. But when you need exact IDs, numbers, dates, relationships—precision suffers.
You need to pair fuzzy retrieval techniques like RAG with exact verification where facts must be correct. Two-stage retrieval: recall candidates, then verify against ground truth.
This is especially important where you have policy, financial facts, or legal facts to validate. There was a prominent fine leveled against a major consultant firm—close to half a million dollars—because they couldn’t verify facts around court cases in a document they prepared. They hallucinated case citations and nobody caught it.
Retrieval failed. And because the LLM is designed to keep the conversation going, it just inserted something plausible.
You need to be able to verify retrieval against ground truth. If it’s a small task, that might be the human at the other end of the chat. If it’s a large agentic system, you need to do it automatically using an AI agent for evals.
8. Memory compounds through structure
Random accumulation doesn’t compound. It creates noise.
Just adding stuff doesn’t compound. If we just added memories randomly the way we experienced them in life and we had no lossiness, no forgetting ability, we would not be able to function as people.
In the same way that forgetting is a technology for us, structured memory is a technology for LLM systems.
Evergreen context goes one place. Version prompts go another place. Tagged exemplars go another place. At small scale, yes, you can do this—people are doing it with Obsidian, with Notion, with other systems, as individuals. And you can scale this as a business. Same principle.
You let each interaction build without degradation if you have structured memory. Otherwise, you have random accumulation. Otherwise, you have the pile of transcripts you never got to and you’re like, “Well, this is data, we’re logging it, it’s probably good.”
It’s gonna be random accumulation. It creates noise. You’re not gonna have structured memory that compounds.
Bonus Quick Start Prompt: Design Your Own Memory System
Use this prompt with Claude or ChatGPT to design your memory system. Copy and paste:
I need to design a memory system for my AI work. Help me think through:
1. What are the different types of context I’m working with? 
   (Separate by lifecycle: permanent preferences, project facts, 
   session state)
2. What shape does each type need to be? 
   (Sharp instructions, structured data, long documents, event logs)
3. Where should each type live? 
   (Files, key-value stores, tables, embeddings, chat history)
4. How will I retrieve each type? 
   (Exact match, semantic search, filtered queries, mode-aware)
5. What’s my portability strategy? 
   (How do I move this memory between tools if needed?)
For each type of context I identify, ask me:
- How often does it change?
- How precisely do I need to retrieve it?
- What breaks if it’s wrong?
- Does it need to survive tool switches?
Then help me sketch a simple implementation plan that starts with 
the highest-leverage, lowest-effort pieces.
THIS PROMPT IS FOR YOU. ASK ONE QUESTION AT A TIME. RUN NOW.This prompt applies the 8 principles to your specific situation. It forces you to separate by lifecycle, match storage to query pattern, think about portability, and design for retrieval that works.
The AI will ask clarifying questions. Answer them honestly. The goal isn’t a perfect system upfront—it’s intentional choices that compound over time.
Think of this prompt as the quick and easy version of the overall prompt system—something to get you started.
And remember, if you want the big prompts, they’re here
Start Here This Week
You don’t need to implement everything at once. Pick one move based on where you are:
If you’re just starting: Version three prompts this week. Save them somewhere you can grab them—Notion, Google Docs, a folder. Note what each one is for. This is Principle 5 (build portable) and Principle 8 (structure compounds).
If you have prompts but no system: Create a single “Context Library” page with two sections: Profile & Preferences (permanent), and Work Playbooks (evergreen). Write 3-5 bullets in each. Use it in your next three AI conversations. This is Principle 2 (separate by lifecycle).
If you’re working on a significant project: Write one project brief using this format: goal & audience, canonical facts (IDs, dates, metrics), scope and out-of-scope, prior decisions, deliverable format, acceptance criteria. Use it instead of uploading raw documents. This is Principle 6 (compression is curation).
If you’re building systems: Add a multi-store layer to your next project. Key-value for preferences, SQL for facts, vectors for recall. Keep them separate and see how retrieval precision improves. This is Principle 3 (match storage to query pattern).
If you’re evaluating vendors: Use the 8 principles as a scorecard. Ask: Does this tool let me separate by lifecycle? Can I export my memory? How does retrieval change by task type? Does it support verification of facts?
The pattern: start small, be intentional, compound from there. Memory systems that work are built incrementally, not architected perfectly upfront.
But they must be built. Passive accumulation doesn’t scale. Active curation does.
Remember why this matters: Consistent outputs across conversations and tools. Context that actually persists. Work that compounds instead of starting from zero every time.
Memory that works.
I make this Substack thanks to readers like you! Learn about all my Substack tiers here
Subscribed


