🔗 Link

Muon: The Magic of Three Numbers Making GPUs Go Brrr

Aug 5, 2025

mloptimizationgpualgorithms

“The coefficients (3.4445, -4.7750, 2.0315) were chosen to maximize convergence rate, with the surprising observation that empirically, ε can be as high as around 0.3 without harming the loss curve.”

Muon Original Blog Post

Like many, the recent success of Muon in large scale training runs for Kimi caught my attention. The loss curve plot made the rounds (for good reason):

Kimi model loss curve showing Muon optimization performance

The core insight behind Muon is elegant: instead of applying gradient updates directly, it uses Newton-Schulz matrix iteration to pseudo-orthogonalize the updates first. This approximates (key word) finding the nearest semi-orthogonal matrix to the gradient update.

Why is this useful? The blog touches on theory (rare gradient directions are less likely to get drowned out), but ultimately like most deep learning improvements: because it works.

Reading through the original Muon blog post by Keller Jordan, I was somewhat surprised to see that the “pseudo” orthogonalization relies on 3 magic numbers.

Here’s the core algorithm:

def newtonschulz5(G, steps=5, eps=1e-7):
    a, b, c = (3.4445, -4.7750, 2.0315)  # The magic numbers!
    X = G.bfloat16()
    X /= (X.norm() + eps)
    if G.size(0) > G.size(1):
        X = X.T
    for _ in range(steps):
        A = X @ X.T
        B = b * A + c * A @ A
        X = a * X + B @ X
    if G.size(0) > G.size(1):
        X = X.T
    return X

These constants were derived emprically to maximize convergence rate. There’s solid mathematic reasoning behind the choice, but still. 3 numbers that work for all models? Crazy.

In many cases, magic numbers == bad. In some cases, magic numbers == stabilizing multi-million dollar GPU-go-brrr training runs.

Read the full piece →

View on page →

kellerjordan.github.io ↗ via Moonshot AI loss curve discussions

🔗 Link

How Claude Code actually uses CLAUDE.md files

Jul 30, 2025

claude-codeai-toolsdocumentationworkflow

Claude Code’s CLAUDE.md Implementation Detail

via @alwaysallison

The prompt used in Claude Code to load CLAUDE.md files is:

”…IMPORTANT: this context may or may not be relevant to your tasks. You should not respond to this context or otherwise consider it in your response unless it is highly relevant to your task. Most of the time, it is not relevant.”

Ahh, so this is why CLAUDE.md files feel so inconsistent. Turns out Claude has full discretion to completely ignore them.

Amazing that Claude Code works so well given the context loading is essentially “hey if this is useful, great, but feel free to completely ignore it”

Interesting follow up in the replies about how @alwaysallison found this out by setting ANTHROPIC_BASE_URL to a proxy and sniffing out what Claude Code actually does via tool calls.

A good lesson for deeper undestannding of the tools is to always peel back the layers in order to show me the f*#!^ing prompt

See the original tweet →

View on page →

x.com ↗ via @alwaysallison on Twitter

📄 Post

Homelab Update: The Paternity Leave Edition

Jul 12, 2025

homelabdockerself-hostinginfrastructure

It took 5 years, but I finally updated the jank of a homelab setup I had. All it took was paternity leave to create enough free time to finally knock it off the to-do list.

I was mostly in need of a repo cleanup and upgrading the backbone docker compose hosting most services.

Here’s a brief overview of the new things I found useful and fun.

Repo Cleanup and Improved Setup

The main thing here was cleaning up the house and making it easier to hack on in the future:

Long overdue update to docker compose from 3.4 to 3.8
Removed network mode from all services except Plex (which had issues)
Dev and prod versions for testing out new things
A central Makefile which acts as the backbone for most dev commands
Automated container/service upgrades weekly

New Additions

Homepage

Homepage provides a nice landing page that links through to everything you’re hosting. Clean, functional, and exactly what I needed for organizing all my services.

Overseerr

Overseerr provides a nice media request UX that hooks up to the *arrs and Plex nicely. Also has Plex OAuth built in, which is convenient. My only complaint is that it’s quite slow in rendering - needs some kind of cache I’d guess.

Lazy Librarian

LazyLibrarian is kind of jank software, but I’ve yet to find something that’s as nice as the *arrs for managing and especially requesting ebooks. Readarr was seemingly the answer but it never got out of beta and recently was archived as a project. After a bit of jerry-rigging, I can now request a book from anywhere and it’ll show up on my Kindle (which I need to jailbreak once the latest firmware is cracked).

Cloudflare Tunnels

Cloudflare Tunnels make it mega easy to make self-hosted stuff securely accessible externally. The setup process is surprisingly straightforward and removes a lot of the complexity around exposing services safely.

Tailscale

Yes, like everyone else I think Tailscale is great.

Between Tailscale and Cloudflare Tunnels, everything I host is now accessible outside my network. If only I need access, it’s hooked up to Tailscale. If others need access (e.g., friends and family with access to Overseerr for requesting media) I have it accessible via Cloudflare tunnels with some kind of auth.

Bonus: Code on the Go with a-shell, SSH and Claude Code!

I added a-shell to my phone with some aliases to quickly SSH into my server, where I mostly use it to interact with Claude Code. Not a ton of times I’m on the go and want to code, but ngl it feels pretty cool when you do.

Removed Things

These friends said goodbye:

Ombi

Replaced by Overseerr - the UI is cleaner and the Plex integration works better.

Readarr

Project archived/on hiatus. Was never able to get it to work well anyway, so no big loss here.

The homelab is finally in a state where I’m not embarrassed to show it to other people. More importantly, it’s actually maintainable now. Sometimes all it takes is a few weeks of very expected and enjoyable free time to tackle that pile of side project that’s developed over years tech debt.

Also the kid is much cooler than anything on my homelab, so that’s a win too.

View on page →

📝 Article

MCP-Filesystem: From Mittens to Surgical Gloves for AI File Operations

Mar 6, 2025

aillmmcpclaudedev-tools

“Can you add these properties to all my Obsidian notes?” I asked Claude, thinking I’d just handed it a simple task. Two hours later, I was watching in frustration as it exhausted its context window trying to load entire files, then struggled to make targeted edits without rewriting everything from scratch.

The dreaded rate limit popped up. Try again in ~3 hours. Joy.

This wasn’t an isolated incident. MCPs are awesome, but for local file editing I kept hitting the same wall: current filesystem tools for AI assistants are too primitive for real-world use.

The problem isn’t Claude’s intelligence - it’s the crude tools we’ve given it for navigating our filesystems. Current MCP filesystem servers treat files as monolithic blobs, forcing assistants to process everything even when they need just a few lines.

After a weekend of Obsidian frustration, I built something better: MCP-Filesystem, a Model Context Protocol server that gives Claude and other AI assistants the ability to work with files in a smarter, more efficient way.

What’s an MCP Server Anyway?

MCP (Model Context Protocol) servers are intermediaries that connect AI models like Claude to external tools and data sources. They follow a client-server architecture that lets AI assistants access capabilities beyond their built-in functions.

Without an MCP server, Claude (or other AI tools) can only work with what you directly paste into your conversation. With an MCP server, Claude gains new abilities - it can take actions using tools and enrich its context with relevant information.

Setting up an MCP server is like setting Claude free from the constricted box of a chatbot UI. The standard MCP file server does the basics - it can open, read, and write files - but it’s like giving Claude mittens instead of precision tools.

My MCP-Filesystem implementation gives Claude the equivalent of surgical gloves and a full toolbox for file operations.

Surgical tools and access to the filesystem, what could go wrong (actually nothing for me—yet.)

The MCP-Filesystem Difference: Context-Aware Intelligence

Most filesystem MCP servers are fundamentally limited in how they let AI assistants interact with files. They typically:

Load entire files into the AI’s context window, wasting precious tokens on irrelevant content
Lack efficient search capabilities across multiple files or within large files
Provide only basic editing functions with little verification or precision
Treat the filesystem as a static repository rather than a dynamic workspace

MCP-Filesystem takes a different approach. It’s designed specifically for intelligent context management - giving AI assistants the ability to:

Retrieve only relevant content with precise line targeting and context controls
Search intelligently within and across files with powerful grep-like capabilities
Make surgical edits with content verification to prevent conflicts
Navigate efficiently through large file structures without context bloat

The difference is quite noticeable in practice. Where standard MCP filesystem servers quickly exhaust Claude’s token capacity on large files, MCP-Filesystem lets it work efficiently with projects of any size. Instead of loading entire files to find one function definition, it can search precisely and retrieve just what’s needed, with exactly the right amount of surrounding context.

The result is an AI that can work alongside you on real-world projects, finding exactly what it needs and making precise changes without exhausting its token budget on irrelevant content.

Smart Capabilities That Make the Difference

Intelligent Search and Retrieval

MCP-Filesystem’s search capabilities via grep like search allow for content searching and not just file matching + reading an entire file:

# Traditional approach - load entire file, scan manually
entire_file = read_file("/path/to/large_file.py")
# Consumes thousands of tokens for potentially irrelevant content

# MCP-Filesystem approach
results = grep_files(
    "/path/to/project",
    "function process_user_data",
    context_before=2,
    context_after=5,
    include_patterns=["*.py"],
    results_limit=20
)
# Returns precisely what's needed with perfect context control

The server uses ripgrep under the hood when available, giving Claude blazing-fast search capabilities across massive codebases and files - all while remaining token-efficient.

Surgical File Operations

When editing files, precision matters. MCP-Filesystem offers targeted operations that eliminate the risk of unintended changes:

# Make precise edits with verification
edit_file_at_line(
    "/path/to/file.py",
    line_edits=[{
        "line_number": 42,
        "action": "replace",
        "content": "    return processed_data",
        "expected_content": "    return data"  # Verify before changing
    }],
    abort_on_verification_failure=True
)

This verification system ensures Claude only changes what it intends to, preventing those frustrating moments where an AI assistant inadvertently modifies the wrong code section.

Line-Targeted Reading

When working with large files, MCP-Filesystem lets Claude read only what it needs:

# Instead of loading the entire file
content, metadata = read_file_lines(
    "/path/to/large_file.py",
    offset=99,   # Start at line 100
    limit=20     # Read just 20 lines
)

This makes a massive difference when working with files that would otherwise consume thousands of tokens.

How I’m Actually Using This

Since building this tool, I’ve found several ways it’s changed how I work with Claude:

Navigating My Own Codebases

I work on several projects with sprawling codebases.

While Claude desktop is not my daily driver for AI coding, it is quite useful to have it update documentation or write a quick file after some back and forth rather than in an IDE (or neovim let’s gooooo)

Before, I’d spend time manually opening relevant files for Claude to analyze. Now, I can just ask:

“Find all the places where we use the mcp.tool decorator and explain the pattern”

Claude uses grep_files to find the relevant code sections, then read_file_lines to examine specific implementations. It can build a comprehensive understanding without me having to play tour guide through my own code.

Dealing With Those Inevitable “Big Files”

We all have them - the massive config files, the documentation monoliths, the “god files” with 1000s of lines of unmaintained code. For me, it was particularly painful with my Obsidian notes and some legacy code files.

Instead of watching Claude load 2000+ lines when I only need a small change, I can now be specific:

“Find all my Obsidian daily notes that are missing the ‘tags’ property and add a default tags section”

“Look at my Neovim config related files and add Telescope keybind to the plugin I just added with my standard keybindings”

Claude finds the relevant section, retrieves just what it needs with appropriate context, and makes precise edits. No more token bloat, no more rewriting entire files for small changes.

Better Tools for Smarter Assistants

AI models are becoming increasingly capable - but they’re still limited by the tools we give them. MCP-Filesystem fills a gap in the existing toolchain, allowing AI assistants to work more effectively with your files.

With tools like this, I no longer need to hold Claude’s hand through every file operation. It can find relevant information across my projects, make targeted edits, and preserve more of its context window for actual thinking rather than storing unnecessary file content.

And while there’s a small performance cost compared to raw file operations (particularly when using the Python fallback instead of ripgrep), the token efficiency and precision gained make it worth the trade-off.

This approach saves tokens while enabling more practical workflows with AI assistants, especially for coding, writing, and information management tasks.

Getting Started

MCP-Filesystem is open-source and easy to set up:

1. Clone the Repository

git clone https://github.com/safurrier/mcp-filesystem.git

2. Update Claude Desktop Configuration

Edit your Claude Desktop configuration file:

On macOS:

~/Library/Application\ Support/Claude/claude_desktop_config.json

On Windows:

%APPDATA%\Claude\claude_desktop_config.json

Add the MCP-Filesystem server to the config with the directories you want to allow

{
  "mcpServers": {
    "mcp-filesystem": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/mcp-filesystem/repo",
        "run",
        "run_server.py",
        "/Users/yourusername/Projects",
        "/Users/yourusername/Documents"
      ]
    }
  }
}

3. Restart Claude Desktop

Close and reopen Claude Desktop for the changes to take effect.

4. Verify It’s Working

Ask Claude to list the allowed directories to verify the setup:

“Can you list the directories you’re allowed to access through the MCP-Filesystem?”

Claude should use the list_allowed_directories tool and show you the paths you configured.

Learn More

Detailed documentation, examples, and advanced configuration options are available in the project README.

Try it out and experience the difference that intelligent file operations make when working with AI assistants. Your projects will thank you - and you’ll never go back to watching Claude try to edit files with mittens on again.

View on page →

📝 Article

Modernizing My Python Project Template

Feb 22, 2025

pythondev-toolsuv

Setting up Python projects consistently has always been incredibly stupid.

Despite being touted as the go to language for new developers, the Python dev env setup is a mess.

Like many, I have hailed the arrival of uv as mana from heavin.

After working with Eugene Yan’s Python collaboration template, I found myself wanting to update it with tools that better reflect current development practices. The result is a modernized template that focuses on speed and reliability while reducing setup friction.

Python project template repo link

Core Updates

The most significant change is the switch to uv for dependency management. In practice, uv handles everything from virtual environments to package installation with noticeably better performance than traditional tools. This alone makes a substantial difference in day-to-day development.

The template now includes:

uv for fast, reliable dependency management
ruff for unified linting and formatting
mypy for static type checking
pytest with coverage reporting
Docker support with multi-service compose configuration
GitHub Actions for automated quality checks

Project Setup

The setup process is straightforward:

git clone git@github.com:safurrier/python-collab-template.git my-project
cd my-project
make init

The initialization handles the essentials - development environment, git setup, pre-commit hooks, and example code management. More importantly, it creates a consistent foundation that you can build on without having to revisit basic configuration issues.

Development Workflow

In practical terms, the template provides two key benefits: speed and consistency. The uv integration significantly reduces time spent on dependency management, while automated checks catch common issues before they make it into the codebase. All quality checks run with a single command (make check), and the same checks run automatically on GitHub with each push.

The template strikes a balance between being opinionated enough to be immediately useful while remaining flexible enough to adapt to different project needs. Check out the repository here.

View on page →

💭 Note

Firecrawl: A Simple Web Scraping Script

Feb 10, 2025

pythonscriptingweb-scrapinguv

Code is available via gist here

A command-line utility for scraping web content through the Firecrawl API. The script processes URLs—either individually or in bulk from a text file—and outputs clean, formatted content with automatic file naming based on page titles.

Beyond basic scraping, it handles the tedious aspects of web content extraction: stripping navigation elements, removing duplicate content, and managing filename collisions.

The script maintains a focused scope: it takes a URL or file of URLs as input, processes them through Firecrawl’s API endpoints, and saves the results in a specified directory (defaulting to ./scraped when no output location is provided).

It also makes use of inline depdencies to ensure the script is self-contained and can be run anywhere with minimal setup.

With the magic of uv you can even run this via a gist URL, making it extremely portable.

Usage

Locally

# Set your API key
export FIRE_CRAWL_API_KEY=your-api-key-here

# Run with a single URL
uv run firecrawl_scrape.py <https://example.com>

# Or with a file containing URLs
uv run firecrawl_scrape.py urls.txt

# Optionally specify output directory (default: ./scraped)
uv run firecrawl_scrape.py urls.txt -o ./my-scrapes

Via gist

# Set your API key
export FIRE_CRAWL_API_KEY=your-api-key-here

# Run with a single URL
uv run `https://gist.githubusercontent.com/safurrier/8714235a36a5dc502a8f4b2edb98ece3/raw/969f25a37895943725e8a42cae6a219bda3565fa/firecrawl_scrape.py` <https://example.com>

# Or with a file containing URLs
uv run `https://gist.githubusercontent.com/safurrier/8714235a36a5dc502a8f4b2edb98ece3/raw/969f25a37895943725e8a42cae6a219bda3565fa/firecrawl_scrape.py` urls.txt

# Optionally specify output directory (default: ./scraped)
uv run `https://gist.githubusercontent.com/safurrier/8714235a36a5dc502a8f4b2edb98ece3/raw/969f25a37895943725e8a42cae6a219bda3565fa/firecrawl_scrape.py` urls.txt -o ./my-scrapes

A simple script with a simple purpose. Plus by running via gist you have this available anywhere, anytime.

View on page →

📝 Article

Paper Review: SimPER — Simple alignment with Perplexity optimization

Feb 10, 2025

aillmpaper-reviewtechnical

Paper + Code

Understanding AI model behavior often feels like trying to teach a brilliant but literal-minded alien how humans think. SimPER offers a refreshingly straightforward approach to this challenge - it strips away the complexity of preference learning and replaces it with a simple principle: if humans prefer one response over another, the model should find the preferred response natural and the rejected one strange.

Key Concepts

Preference optimization traditionally requires careful tuning of multiple parameters and reference models. SimPER eliminates this complexity by using perplexity - a well-known evaluation metric for language modeling that assesses a model’s ability to process text. The core idea is expressed in the following equation:

L_{SimPER} = -\exp(\frac{1}{|y_w|} \log \pi_\theta(y_w|x)) + \exp(\frac{1}{|y_l|} \log \pi_\theta(y_l|x))

where:

$\pi_\theta(y|x)$ is the language model policy generating sequence y given input x
$y_w$ and $y_l$ are the chosen and rejected responses from the preference dataset
$|y|$ represents sequence length for normalization
The negative exponentiated term minimizes perplexity for chosen responses
The positive exponentiated term maximizes perplexity for rejected responses

The sequence length normalization (1/|y|) provides natural handling of different response lengths, addressing a key challenge in previous approaches.

This replaces Direct Preference Optimization (DPO), which requires both hyperparameter tuning and a reference model:

L_{DPO} = -\log \sigma(\beta[\log \frac{\pi_\theta(y_w|x)}{\pi_{ref}(y_w|x)} - \log \frac{\pi_\theta(y_l|x)}{\pi_{ref}(y_l|x)}])

where:

$\pi_{ref}$ is a reference model needed to constrain policy updates
$\beta$ is a critical hyperparameter controlling deviation from the reference model
The ratio terms measure relative probability between current and reference policies

Improving Gradient Stability

Traditional approaches face gradient instability due to their KL divergence formulation. As shown in the paper’s gradient analysis (Section 3.3), DPO’s gradient takes the form:

\nabla_\theta L_{DPO} = -\beta E_{(x,y_w,y_l)} [w_\theta \cdot (\frac{\nabla_\theta \pi_\theta(y_w|x)}{\pi_\theta(y_w|x)} - \frac{\nabla_\theta \pi_\theta(y_l|x)}{\pi_\theta(y_l|x)})]

where $w_\theta = \sigma(\beta \log \frac{\pi_\theta(y_l|x)}{\pi_{ref}(y_l|x)} - \beta \log \frac{\pi_\theta(y_w|x)}{\pi_{ref}(y_w|x)})$ represents the gradient weight.

When $\pi_\theta(y_l|x) \to 0$ , the norm of the gradient on rejected responses becomes large, leading to:

Huge parameter updates focused on decreasing rejected response likelihood
Potential instability in training
Decreased likelihood of both chosen and rejected responses, as they often share tokens

SimPER’s gradient, derived from perplexity optimization, has a more balanced form:

\nabla_\theta L_{SimPER} = -E_{(x,y_w,y_l)} [\nabla_\theta p_\theta(y_w|x) - \nabla_\theta p_\theta(y_l|x)]

where $p_\theta$ represents the geometric mean over token probabilities. This formulation:

Naturally bounds gradients without explicit constraints
Better balances updates between chosen and rejected responses
Prevents catastrophic decreases in chosen response likelihood

Empirical evidence in Figure 3 of the paper demonstrates this stability, showing SimPER maintains higher chosen response likelihood while achieving similar preference margins.

Theoretical Foundation: Total Variation Distance

SimPER’s perplexity optimization connects to Total Variation Distance (TVD), as proven in Theorem 3.1 of the paper. TVD between two distributions is defined as:

TV(p\|q) = \frac{1}{2} \sum_{x \in X} |p(x) - q(x)|

The paper proves that minimizing perplexity asymptotically optimizes TVD between the model distribution and chosen response distribution:

\min_\theta L_{SimPER} \Rightarrow \min_\theta TV(\pi_{chosen}(y|x)\|\pi_\theta(y|x))

This theoretical connection explains several key properties:

Mode-seeking behavior due to TVD’s focus on absolute differences
Natural bounds on optimization (TVD ∈ [0,1])
Robustness to outliers compared to KL divergence

Behavioral Patterns: Mode-Seeking vs Mode-Covering

The paper demonstrates (Figure 2) fundamental differences in how SimPER and DPO handle uncertainty:

Mode-Covering (DPO):

Minimizes forward KL divergence, leading to mass-covering behavior
Maintains probability across all reasonable responses in the dataset
Can overestimate the long tail of the target distribution
Shows better performance on tasks requiring diverse outputs

Mode-Seeking (SimPER):

Minimizes TVD, leading to mode-seeking behavior
Concentrates probability mass on high-confidence regions
Similar to behavior observed in RLHF systems
Particularly effective for tasks requiring precise responses

This theoretical distinction is supported by empirical results showing SimPER’s superior performance on reasoning-heavy tasks (Table 3 in the paper), where decisive responses are crucial.

Implementation Details

The paper provides a straightforward implementation that achieves these theoretical benefits:

def calculate_perplexity(input_ids, attention_mask):
    outputs = model(input_ids, attention_mask=attention_mask)
    log_probs = outputs.logits.log_softmax(-1)
    token_perplexities = -log_probs.gather(-1, input_ids.unsqueeze(-1))
    mean_neg_log_prob = token_perplexities.mean(dim=1)
    return torch.exp(mean_neg_log_prob)

def compute_loss(chosen_ids, chosen_mask, rejected_ids, rejected_mask):
    chosen_perplexity = calculate_perplexity(chosen_ids, chosen_mask)
    rejected_perplexity = calculate_perplexity(rejected_ids, rejected_mask)
    return -1/chosen_perplexity + 1/rejected_perplexity

Empirical Results

The paper validates these theoretical advantages with extensive experiments showing:

Up to 5.7 point improvements on AlpacaEval 2
Consistent outperformance across 10 Open LLM Leaderboard benchmarks
Superior results on reasoning-heavy tasks like MT-Bench
Better maintenance of chosen response likelihood during training

Conclusion

The elegance of SimPER’s approach echoes an important lesson in machine learning - sometimes simpler solutions not only work better but tell us something fundamental about the problem itself. By reducing the number of assumptions built into preference learning systems through perplexity optimization, SimPER achieves both theoretical elegance and practical performance. The fact that such a straightforward approach can match or exceed more complex methods while eliminating hyperparameters points to promising directions for future research in language model alignment.

View on page →

📝 Article

Unclever Code: Metaprogramming for Mortals

Feb 9, 2025

pythonmetaprogrammingbest-practicestechnical

be me

tired of writing the same validation code

“wait, you can make code write code?”

discovers Python decorators

first decorator: simple logging

second decorator: input validation

third decorator: caching

“I AM BECOMING UNLIMITED”

one year later

trying to explain my “framework” to new team member

they’re crying

i’m crying

the code is crying

They say the road to production hell is paved with clever abstractions. Metaprogramming—that practice of “code that manipulates code”—might just be the express lane. One minute you’re feeling like a coding deity, orchestrating an elegant dance of decorators and metaclasses. The next, 7 layers deep in the stack trace trying to figure out which middleware functions decided to silently convert your integers to strings.

Behold, the “hat on a hat” of decorative metaprogramming solutions:

# this function is so top heavy it's about to tip over
@log_everything
@add_metrics
@middle_decorator # goes in the middle
@handle_errors
@validate_input
@cache_results
def business_logic():
    # the real treasure was all those decorators
    # we passed through along the way
    return "No"

I’m here to present a different perspective: metaprogramming isn’t about clever tricks or reducing lines of code—it’s about extending a language (here: Python) to better express your domain’s concepts.

Common Utility Patterns

Before diving into domain-specific territory, let’s address those utility patterns that every developer discovers eventually. Logging, caching, retries—they’re like the gateway drug to metaprogramming:

@cache(ttl=3600)
@retry(max_attempts=3)
@log_calls
def fetch_data():
    pass

While these patterns are useful, they should be:

Used sparingly and explicitly
Composed thoughtfully
Focused on operational needs

Here’s what thoughtful utility usage looks like:

# Being explicit about what we're caching and why
@cache_to_redis(
    ttl="1h",
    key_prefix="user_data",
    invalidate_on=["user_update"]
)
def get_user(user_id: int) -> User:
    return db.fetch_user(user_id)

# Clear about retry behavior because network calls are fickle
@retry(
    max_attempts=3,
    on_exceptions=[NetworkError],
    backoff_factor=1.5
)
def external_api_call():
    pass

Employing metaprogramming reduces code footprint significantly. Adding a minimal amount of verbosity at the point of implementation is a worthy tradeoff!

Why Use Metaprogramming?

1. Domain Expression

Sometimes Python’s built-in syntax just doesn’t naturally fit the concepts you’re trying to express. That’s where metaprogramming shines.

Take Django’s database models. An example of making Python speak database

class User(Model):
    name = CharField(max_length=100)
    email = EmailField(unique=True)

    @receiver(post_save)
    def send_welcome_email(sender, instance, created, **kwargs):
        if created:
            send_email(instance.email)

It’s python, but it’s databases. Clarity.

2. Clarity at the Point of Use

When you want to hide complexity but keep intent clear.

E.g. FastAPI, the much loved web framework. One key selling point: it’s readable and comes with useful free stuff (api type hints → documentation, b/c you know developers will not actually write documentation).

# FastAPI: Easy to read, easy to use
@app.get("/users/{user_id}")
@requires_auth
def get_user(user_id: int) -> User:
    return db.get_user(user_id)

3. Enforcing Patterns

When you need to ensure consistent behavior without writing the same boilerplate 500 times:

# Pydantic: Because runtime errors are so 2010
class UserCreate(BaseModel):
    name: str
    age: int = Field(gt=0, lt=150)
    email: EmailStr

The Three Tools of Metaprogramming

1. Decorators

Think of decorators as function modifiers—they let you wrap existing functions with new behaviors. The beauty is that they’re explicit about what they do right at the point of use (note: beauty is still in the eye of the beholder)

# A simple route decorator shows exactly what this endpoint does
@app.post("/users/", status_code=201)
def create_user(user: UserCreate):
    return db.create_user(user)

# Authentication decorators make security requirements clear
@requires_permission("admin")
def delete_user(user_id: int):
    return db.delete_user(user_id)

2. Metaclasses

Metaclasses are the behind-the-scenes directors of class creation—they determine how your classes are built and behave. They’re powerful but complex—like a chainsaw, they can either help you build something amazing or cause spectacular disasters.

class ModelMetaclass(type):
    def __new__(cls, name, bases, attrs):
        # This is where the magic happens...and where
        # stack traces go to die
        for key, value in attrs.items():
            if isinstance(value, Field):
                value.contribute_to_class(cls, key)
        return super().__new__(cls, name, bases, attrs)

class Model(metaclass=ModelMetaclass):
    pass

3. Descriptors

Descriptors give you fine-grained control over attribute access. They’re perfect for when you need to add validation, computation, or tracking to class attributes:

class Positive:
    def __get__(self, obj, type=None):
        return obj._value

    def __set__(self, obj, value):
        if value <= 0:
            raise ValueError("Must be positive")
        obj._value = value

class Account:
    balance = Positive()  # Now balance can never be negative

Common Pitfalls and Solutions

The Stack Trace Trap

Ever seen a stack trace that looks like it’s been through a paper shredder? Decorators are usually the culprit:

# The problematic child
def log_calls(func):
    def wrapper(*args, **kwargs):
        print(f"Calling {func}")  # Which function? Who knows!
        return func(*args, **kwargs)
    return wrapper

@log_calls
def important_calculation(x, y):
    return x + y

# The responsible adult
from functools import wraps
import logging

def log_calls(func):
    @wraps(func)  # b/c stack traces should be helpful
    def wrapper(*args, **kwargs):
        logging.info(f"Calling {func.__name__} with args={args}, kwargs={kwargs}")
        result = func(*args, **kwargs)
        logging.info(f"{func.__name__} returned {result}")
        return result
    return wrapper

@log_calls
def important_calculation(x, y):
    """Adds two numbers together."""
    return x + y

Performance Pitfalls

Metaprogramming often involves inspecting or modifying code at runtime, which can introduce unexpected performance costs. Nice job writing decorators that make DB queries look fast.

# Slow: Inspecting function attributes on every call
def log_decorator(func):
    def wrapper(*args, **kwargs):
        # This inspection happens every time
        print(f"Calling {func.__name__} with {func.__dict__}")
        return func(*args, **kwargs)
    return wrapper

# Better: Cache expensive operations
from functools import lru_cache

@lru_cache(maxsize=32)
def expensive_meta_operation():
    # Complex reflection or code manipulation here
    return calculate_complex_result()

# Even Better: Do expensive work once at import time
PRECALCULATED_MAPPINGS = {
    # Calculate mappings once when module loads
    # instead of every function call
    'mapping1': calculate_mapping1(),
    'mapping2': calculate_mapping2()
}

def fast_operation():
    return PRECALCULATED_MAPPINGS['mapping1']

The Golden Rules

Keep it simple (If you can’t explain why you need it, you probably don’t)
Look to existing frameworks for patterns—they’ve already made the mistakes you’re about to make. Django’s model system and SQLAlchemy’s declarative base aren’t accidentally complex
Focus on making your domain concepts clear
Consider the poor soul who’ll maintain your code (it might be future you)
When in doubt, write it twice before abstracting

Metaprogramming is a powerful tool, but like a katana, it’s best used with skill and intent— and not because many neckbeards think it’s the ultimate weapon.

Use it to make your code clearer, not clevererer.

View on page →

All Tags

Muon: The Magic of Three Numbers Making GPUs Go Brrr

How Claude Code actually uses CLAUDE.md files

Claude Code’s CLAUDE.md Implementation Detail

Homelab Update: The Paternity Leave Edition

Repo Cleanup and Improved Setup

New Additions

Homepage

Overseerr

Lazy Librarian

Cloudflare Tunnels

Tailscale

Bonus: Code on the Go with a-shell, SSH and Claude Code!

Removed Things

Ombi

Readarr

MCP-Filesystem: From Mittens to Surgical Gloves for AI File Operations

What’s an MCP Server Anyway?

The MCP-Filesystem Difference: Context-Aware Intelligence

Smart Capabilities That Make the Difference

Intelligent Search and Retrieval

Surgical File Operations

Line-Targeted Reading

How I’m Actually Using This

Navigating My Own Codebases

Dealing With Those Inevitable “Big Files”

Better Tools for Smarter Assistants

Getting Started

1. Clone the Repository

2. Update Claude Desktop Configuration

3. Restart Claude Desktop

4. Verify It’s Working

Learn More

Modernizing My Python Project Template

Core Updates

Project Setup

Development Workflow

Firecrawl: A Simple Web Scraping Script

Usage

Locally

Via gist

Paper Review: SimPER — Simple alignment with Perplexity optimization

Key Concepts

Improving Gradient Stability

Theoretical Foundation: Total Variation Distance

Behavioral Patterns: Mode-Seeking vs Mode-Covering

Implementation Details

Empirical Results

Conclusion

Unclever Code: Metaprogramming for Mortals

Common Utility Patterns

Why Use Metaprogramming?

1. Domain Expression

2. Clarity at the Point of Use

3. Enforcing Patterns

The Three Tools of Metaprogramming

1. Decorators

2. Metaclasses

3. Descriptors

Common Pitfalls and Solutions

The Stack Trace Trap

Performance Pitfalls

The Golden Rules

Browse by Tag