Rummaging through Claude Code's spring cleaning closet

Types as agent instructions, auto-generated verifiers, and experimental REPL mode.

Lots of people talking about the Claude Code source code leak. Most of what’s been found has been covered already, and I don’t think it’s a big deal. Everything moves so fast. I don’t think there’s any secret sauce revealed that changes the current trajectory. If anything, it’s funny, because in a 500,000-line codebase you’re basically looking through someone’s spring cleaning closet, finding objects that haven’t been touched in forever and holding them up as some kind of treasure.

With that in mind, there were three interesting things I haven’t seen mentioned.

1. Using types to communicate with an agent

I love types. I hope everyone loves types. And with AI writing the code, they matter even more.

One thing I noticed was a type being used in an unconventional way. There’s an analytics type called I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS, used to instruct Claude to make sure an analytics event isn’t accidentally injecting code or file paths.

type AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS = never

logEvent('search', {
  query: userQuery as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS
})

To me this is a novel use of types. Rather than “this is a type, this is how the code should behave,” it’s communicating to the agent: this is a contract, abide by it for downstream purposes. It’s the strongest form of comment in the code. Claude might ignore a comment. But a type checker is going to make it think twice.

2. Auto-generated verifiers

I think people right now are underrating the level of verification and how complex your test verification can get. In the past, something like an end-to-end test was expensive to set up and maybe didn’t pay off. But now they’re more valuable than ever.

It looks like Claude Code has an internal skill called /init-verifiers that sets up a lot of the scaffolding for you. It detects what type of environment you’re running in and what type of code you’re building, then scaffolds out end-to-end style tests. Web app gets Playwright. APIs get HTTP assertions. CLI and servers get Tmux.

These verifications push out how far the agent can go before you need to check on it. This is not unit tests on a tiny chunk of code or tests that have been mocked and patched to hell. This is: the code should do X. Confirm it. Almost like you’re a human going through the checklist of knowing what that code should actually do.

3. Experimental REPL mode

Lastly, there’s an experimental REPL mode. Basically their implementation of RLMs (recursive language models), which is an idea that seems to have legs, but we’ll see.

The thinking: instead of one long stateful chain of text with many different tools that might bloat your context, you give the agent a REPL playground to execute within. This separates out two layers of context. One being what we’re used to now, which should stay clean and high quality. And then tool calls and things that generate a ton of context get batched and executed there. Primitive tools (file reads, edits, bash) get hidden and everything routes through the REPL instead.

The interesting implication is that this is a form of the agent managing its own context. Right now, everything accumulates in one long chain and it’s hard for the model to decide what’s worth keeping. With a two-layer system, the agent can work in the REPL, decide what matters, and surface only the relevant results back to the main context. That’s a meaningful shift from the current “everything piles up” default.

This is another strategy for context isolation, similar to using subagents, and something to look out for.