What KongCode is, honestly

Every Claude Code session ends the same way: the context window closes, and everything the model understood about your project closes with it. Not the code. The code is in git. What evaporates is the judgment around the code: the correction it finally absorbed, the reason you rejected the obvious approach in March, the constraint that never became a comment because it only ever lived in conversation. Tomorrow you pay for the same lessons again.

KongCode is our answer to that. It’s a memory layer for Claude Code: a daemon that watches your sessions, extracts what looks durable, and writes it into a graph database on your own machine. The next session starts with the relevant pieces already back in context, before you’ve typed a word.

This is a review, not a launch post. We’re writing the version we’d want to read, which means the costs get equal billing. There are four of them, they’re real, and they don’t show up on anyone’s landing page. Including, until now, ours.

Storage was never the hard part

Saving everything is trivial. Transcripts are text, disk is effectively free, and you could log every session forever without noticing the space. But a pile of transcripts is not memory, for the same reason your shell history is not expertise. The hard problem is selection: deciding, out of forty thousand tokens of back-and-forth, which two hundred will still matter in a month.

A transcript records what happened. A memory is a bet about what will be worth knowing later.

KongCode places that bet on four kinds of material, roughly in order of signal:

Corrections. You told the model it was wrong. A human spent effort steering, which makes this the most expensive signal in the system and the first thing worth keeping.
Decisions with the reasoning attached. “We chose X” is trivia. “We chose X because Y fell over under Z” is what stops the same debate from happening twice.
Preferences. Terse commit messages, no mocks in integration tests, whatever you had to say once and should never have to say again.
Procedures that worked. A multi-step fix that actually landed is a skill, and the same shape of problem tends to come back.

Extraction runs in the background, every few thousand tokens, with no tagging and no manual filing. That matters more than it sounds: a memory system you have to curate is a chore, and chores get abandoned by Thursday.

The obvious objection is that context windows keep growing, so why not just keep everything in the window? Because long-context models still pay attention unevenly, and “find the one correction that matters somewhere in a million tokens of history” is exactly the shape of task they handle worst. A bigger window moves the selection problem. It doesn’t remove it.

The machinery

Two processes. A long-lived daemon owns the database, the embedding model, and the reranker; a thin client inside each session talks to it over a local socket. Run three sessions at once and they share one daemon and one copy of a 420MB embedding model, instead of paying for it three times.

When a prompt arrives, retrieval runs five stages:

Embed the prompt.
Pull candidate memories by vector similarity.
Score the candidates on seven weighted signals.
Rerank the survivors with a cross-encoder that reads each candidate against the actual question.
Walk the graph one hop out from the winners and bring back their neighbors.

Stage five is the one a flat vector store cannot do, and it’s the reason the graph exists. Similarity search finds text that resembles the question: good at paraphrase, bad at consequence. The decision that superseded the one you asked about, the bug that motivated the convention, the constraint sitting two edges away from the topic. None of those share surface form with your prompt, so similarity alone will never surface them. Edges will.

There are 1166 tests holding the invariants in place, including one that checks every edge in the graph against its declared type before a change can ship. The retrieval design itself predates KongCode: it’s ported from kongclaw, a retrieval engine we built and benchmarked first, and on LongMemEval it scores 98.2% Recall@5. Two caveats, stated plainly: that is one benchmark, and the headline number assumes the cross-encoder is loaded. We think the result survives both caveats. You should still know they exist.

The ranking doesn’t stay generic, either. The seven signal weights ship hand-tuned, but a small learned model (ACAN, about 130K parameters) trains on which of your retrieved memories actually got used, and once there is enough history it takes over the ranking. Months in, the system is no longer ranking for some average user. It is ranking for you.

The costs, itemized

Four of them. None disqualifying, but none optional either.

The token bill is invisible

Extraction needs a language model, and the daemon uses yours: it shells out to your authenticated Claude CLI on startup, every five minutes, and after each session ends, working through five to fifteen queued items per run. The economics are deliberate. Pay at write time, in the background, so that reads are instant, local, and free. But deliberate is not the same as visible, and nothing in the UI tells you it’s happening. If you live in a few long-running repos, the spend buys compounding context and is easy to justify. If you open Claude once a week for a shell script, you are buying memory nobody will ever read.

It forgets on purpose

Extraction is quality-gated. Weak-signal material gets dropped before it reaches the graph, and that gate is the only thing standing between you and a memory full of noise. The price is twofold: the memory is lossy by design, and it isn’t deterministic. The same conversation, extracted twice, can keep different things depending on how strong the signals read. Forgetting is what keeps recall fast and relevant; every memory system worth the name makes this trade, biological ones included. But it means the gate will occasionally drop something you’d have kept, and you won’t find out until you go looking for it.

The schema was learned on the job

We’ve written about this before, and it belongs in the review too: the database layer was vibe coded. More than fifteen distinct SurrealQL bugs in the version history, and the audits keep finding more. Queries that crashed on inputs we never guarded. Edge types declared one way in the schema and written another way in the code. A type-coercion bug in the supersede path that quietly corrupted rows until we caught it and healed the data live in 0.7.96. The schema is 817 lines, the smallest layer in the codebase, and it has produced an outsized share of the bugs. Which is exactly why it gets audited line by line, release after release: we know which part of this system we built by feel, and we treat it accordingly. The other sixteen thousand lines, the daemon and the extraction and the retrieval stack, were engineered with intent.

It can die with one disk

Nothing touches the cloud on the read path. Embeddings, graph, retrieval, reranking: all of it local. That is the entire privacy posture, and it is also the durability problem, because your agent’s accumulated memory is a directory on one machine. No sync, no team sharing, no offsite copy unless you make one. There is a native export command in the box for exactly this purpose, and it does not run itself. Privacy and durability pull in opposite directions, and KongCode chose privacy. Plan your backups accordingly.

Who should run it

The case for: you work in long-lived repos, you’re tired of re-teaching the same context every morning, and you want the agent’s understanding of your project to compound instead of resetting to zero. The case against: you use Claude in short, disconnected bursts, in which case the daemon is overhead feeding a graph you will never query.

With the costs on the table, here is the claim we’ll stand behind. This is a serious attempt at the selection problem, not a transcript dump with embeddings on top: a retrieval pipeline with a benchmark behind it, a graph that earns its edges, a four-figure test suite, and a database layer we are hardening in public. Pre-1.0, priced in tokens, schema bugs and all, at github.com/42U/kongcode.