Engineering intelligence

What Karpathy's LLM Wiki Means for Engineering Leaders

by Alina · Evaal AI



Karpathy's wiki post has been stuck in my head for the last two weeks, and most of the takes on it are missing what's actually new about it.

What I keep seeing online is some version of "RAG is dead" or "vector databases are over." Those aren't wrong exactly, but they're treating the post like an architectural argument when it isn't really one. He wasn't telling people to throw out their retrieval stack. He was showing what happens when you stop using an LLM as a question answering machine and start using it as a long running curator. His LLM is organizing for him in the background, slowly, over weeks. The wiki gets better while he sleeps. That's the new role.

And it isn't a small distinction. Most LLM applications today, including most of what gets called RAG, treat each query as a discrete event: question goes in, answer comes out, the system has no memory or stake in what came before or what comes next. Karpathy gave his LLM a different job: watch this stream of stuff, write summaries, link related ideas, keep the corpus organized, and do all of this continuously and asynchronously. That's a categorically different use of the model. It's also, almost by accident, a much harder use, because the LLM has to make judgment calls about relevance and structure that don't come up when you're answering a single prompt.

The reason it works in his post is because his domain is bounded and slow moving. He's curating his own reading list, the corpus updates a few times a week, and the LLM has time to think between batches. Markdown and the specific prompt setup he used don't really matter much; what's interesting is the role itself, and the role generalizes.

Once you see the role, the question becomes which other knowledge streams in your work life would benefit from continuous LLM curation. The answer turns out to be a lot of them. Anywhere a person is drowning in fast moving inputs and the actual job is synthesis, the curator pattern probably matters more than the answerer pattern that everyone has been building toward.

The example that keeps standing out to me is engineering management.

If you run a team of thirty or more engineers, the operational reality of those engineers is scattered across Linear, GitHub, Slack, Calendar, and probably one or two other tools. There is no human alive who can hold a coherent picture of all that in their head at once. The current solution is dashboards (Datadog, Jellyfish, LinearB, a few others), and every one of them measures something while none of them actually tell you what's going on, because what's going on is a synthesis question. For an engineering manager, the synthesis is most of the job.

This is the gap Karpathy's pattern fills, applied to a different corpus.

Picture the wiki he described, except instead of transformer architectures the entries are about what your engineering team is doing. The morning's edition might note that two features actually shipped this week, that one engineer hasn't committed code in three days and might want someone to check in, that a particular Linear ticket has changed scope four times and is likely to slip its deadline, and that a code review on a critical PR has been sitting unmerged for two days. There would also be a list of decisions that have been waiting on you for longer than you realized. The whole thing readable in five minutes.

An LLM curation layer plays that observant friend role for you, working quietly in the background. Writing entries about what changed today, linking them back to last week, flagging the things that look smaller than they actually are. By the time you sit down with coffee, it's already done a pass. The thing you read is a brief, not a dashboard.

The most viral take in AI tooling this month was an application of a pattern that's much more useful in domains where humans are drowning. Engineering management is the cleanest example, with product management next and customer success and sales ops right behind. Anywhere the work is mostly synthesis and the inputs are mostly streams, an LLM in the curator role matters more than an LLM in the question answering role.

Karpathy made a related argument in a talk last year, where he said something about LLMs being collaborators rather than tools. The wiki post is one of the cleaner public illustrations of what he meant. The right takeaway isn't that RAG is over. We've been making LLMs do the wrong job, and the eclipse of RAG is downstream of that.

Evaal AI builds engineering intelligence for teams of 30+. See how it works →