We keep calling it memory

A pattern has been repeating everywhere: open Claude Code, Codex, Cursor, whatever harness you like, we are bolting SQLite databases, vector databases onto the side of it. Every conversation gets logged. Every file the agent touched gets indexed. A keyword search runs against the log when the agent needs to “remember” something. People call this memory.

The bolt-ons aren’t wrong. They solve real problems. The context window fills up before most non-trivial tasks finish. Each new session starts cold. Two agents working on the same codebase can’t see each other’s notes. Search over old conversations is useful, grepping the logs is useful, vector search over files is useful. I run all of these myself.

All of it is explicit: information the model has to be shown again every time it matters, sitting outside the weights. We aren’t training frontier models here, so the question is what explicit memory can actually be made to do, and where it hits a ceiling.

What they aren’t is memory. They’re a filing cabinet. Durable, retrievable, queryable, and structurally nothing like the thing your brain does when you remember where you left your keys.

That distinction matters because we’re about to spend the next decade building something we’re already calling “agent memory,” and the analogy we picked at the start is going to constrain what gets built. So before we get too far into it I want to spend a post asking what memory is, in the disciplines that have been studying it longest.

What memory actually is

Cognitive science doesn’t talk about memory as a single thing. It talks about memory as a family of distinct operations, each with its own neural substrate, its own time constants, and its own failure modes.

Memory
├── Declarative
│   ├── Episodic            events with time and place
│   └── Semantic            facts stripped of context
└── Non-declarative
    └── motor skills, conditioning, priming, ...

Working memory              a parallel system that manipulates
                            its contents rather than storing them

Schematic view. Real memory has sub-types within sub-types, leaky boundaries, and traffic between systems the tree can’t show.

The first cut, due to Larry Squire,¹ is declarative versus non-declarative memory. Declarative is the stuff you can put into words: facts and events. Non-declarative is everything else: motor skills, conditioning, priming. Tying your shoes is non-declarative. Knowing that Paris is the capital of France is declarative. They’re stored in different parts of the brain. Patients with hippocampal damage lose declarative memory but retain motor skills.

Endel Tulving² split the declarative side again into episodic and semantic. Episodic is memory for events with a time and a place attached: the smell of the kitchen the morning you got your acceptance letter. Semantic is memory for facts stripped of their original context: you know Paris is the capital of France, but you don’t remember where you were when you learned it.

Working memory is something else again. Alan Baddeley³ argued that “short-term memory” was being asked to do too much, and proposed a multi-component model: a phonological loop for verbal material, a visuospatial sketchpad for images, an episodic buffer for binding things together, and a central executive that manages attention across them. The point isn’t the specific architecture. The point is that working memory isn’t just “memory with a short half-life.” It’s a different operation altogether, one that manipulates its contents rather than just storing them.

The classic structure⁴ is sensory then short-term then long-term, with rehearsal moving things along. Modern accounts complicate this in every direction. But the broad shape holds: there are multiple stores, with different purposes, and the interesting action is in the transitions between them.

For an agent system, the upshot is uncomfortable. Every word in the previous four paragraphs describes an operation, not a table. Memory in the cognitive sense is a process of writing, transforming, consolidating, retrieving, and forgetting, with each step doing different work. A SQLite log captures none of those operations. It captures the inputs and lets you query them later. That’s transcription, not memory.

What an agent has today

Look at any current agent harness and what it has, mapped to the taxonomy above:

Cognitive system	What it does	Closest agent analog
Working memory	Holds and manipulates a bounded set of items	Context window
Episodic memory	Records events with time and place, integrates with prior knowledge	Conversation log (transcript only, no integration)
Semantic memory	Stores facts stripped of context, reorganized by use	RAG store (static corpus, no reorganization)
(Dynamics)	Decay, consolidation, replay, strengthening with use	(none)

The context window is the closest thing to working memory: bounded, actively manipulated, lost when the session ends. It’s the only piece of the picture that current systems do well.

The conversation log is a transcript. It’s not episodic memory. Episodic memory has temporal structure, integration with what was already known, and forgetting. A transcript has none of those. It has timestamps and exact text.

A RAG store is not memory either. RAG is information retrieval over a corpus. It’s a useful thing, but calling it long-term memory is a category error: the corpus isn’t constructed by experience, it doesn’t decay, and it doesn’t reorganize itself with use. It’s a library card catalog wearing a memory costume.

What’s missing is the dynamics. Decay. Consolidation. Strengthening with use. Reorganization through replay. These aren’t optional features you’d add for polish. In biological memory, they’re the things that make the rest of the system work.

The primitives that make it memory

Cognitive science has been formalizing the dynamics for decades. A small list of the primitives that matter:

Hebbian learning. Donald Hebb (1949) proposed that when one cell repeatedly contributes to firing another, the connection between them strengthens. Often summarized as “cells that fire together, wire together,” which is a slight misquote of a more careful claim. The principle is that co-occurrence in time is itself a learning signal, no error gradient required. For agent memory, this means that two pieces of information accessed together should be more likely to be retrieved together in the future, even if no human ever told the system they were related.

Spreading activation. Collins & Loftus⁵ modeled semantic memory as a network where activating one concept partially activates connected ones. When you think doctor, the words nurse and hospital and stethoscope become slightly more accessible to you. Activation spreads, decays, and competes. This isn’t decoration. It’s how priming works, how associations are recalled, how concepts get connected without any one moment having connected them.

Decay. Unrehearsed memories fade. The mathematical form is debated, but the broad shape is exponential or power-law. Decay is not a bug. It’s the operation that prevents memory from being equally weighted by time, which would make it useless. A memory system without decay is one where last year’s irrelevant lunch order has the same retrieval weight as this morning’s important conversation.

Consolidation. McClelland, McNaughton & O’Reilly⁶ argued for complementary learning systems: a fast hippocampal store that records specific experiences, and a slow neocortical store that integrates them into general knowledge over time. Consolidation is the process that moves things from the first to the second. In sleep, mostly. The fast store is high-resolution and forgets quickly. The slow store is low-resolution and stable. You need both.

Confidence. Not all memories are equally trusted. The same event recalled twice can have different confidence the second time, and that confidence updates with corroboration. Bayesian models of memory treat each retrieval as evidence to be combined with prior strength.

These five aren’t exhaustive. They’re a starting set. Jeff Hawkins’s A Thousand Brains (2021) and Numenta’s broader research line cut the problem differently, with memory tied to spatial and temporal reference frames built up across cortical columns. Mark Burgess’s In Search of Certainty (2015) and his earlier Promise Theory describe how reliable behavior emerges from agents working with incomplete information, which is most of what memory is. Information theory frames the same questions as channel capacity and signal recovery.

These traditions are conspicuously absent from how agent memory is currently being built. The closest formal model in mainstream use is John Anderson’s ACT-R, where memory activation is a function of recency and frequency, modulated by context. ACT-R has been around since the 1990s. The math is right there.

Mind the gap

We’ve been arguing about memory since antiquity. About agency at least as long. The gap to working agent memory is enormous.

Squire, L. R. (1992). Memory and the hippocampus: A synthesis from findings with rats, monkeys, and humans. Psychological Review, 99(2), 195-231. ↩
Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson (Eds.), Organization of Memory. Academic Press; Tulving, E. (1985). Memory and consciousness. Canadian Psychology, 26(1), 1-12. ↩
Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The Psychology of Learning and Motivation (Vol. 8). Academic Press. ↩
Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence & J. T. Spence (Eds.), The Psychology of Learning and Motivation (Vol. 2). Academic Press. ↩
Collins, A. M., & Loftus, E. F. (1975). A spreading-activation theory of semantic processing. Psychological Review, 82(6), 407-428. ↩
McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex. Psychological Review, 102(3), 419-457. ↩

What memory actually is

What an agent has today

The primitives that make it memory

Mind the gap

Footnotes