Ruining the Fun of March Madness by Having AI Fill Out My Bracket
I pointed Claude at Evan Miya probabilities and Yahoo public picks and told it to build a bracket optimizer. It mostly worked, except for the part where it started picking 16-seeds.
It’s March Madness, which means bracket time. Part of the fun is just picking random stuff, but I’m incapable of turning off my brain completely. I’m a fan of college basketball and I’m somewhat into the analytics, but still a believer in the eye test. I wanted to combine both worlds: be smart about it, use high-quality data sources, but also try to be a little more principled about the contrarian angle. Without becoming that guy who won’t shut up about his bracket strategy.
So I figured, this sounds like a good research task for Claude. We have rich data, we have an idea, let it do the implementation work.
The Data
I pointed it at two sources: Evan Miya’s per-round probability for each team (how likely each team is to actually advance), and Yahoo Sports’ popular picks for each team in each round (what the public is picking). You can understand team strength with Evan Miya, and you can find the betting edge with Yahoo.
I described the task to Claude. Here’s this data, here’s that data, figure out how to get it. Let’s plan out how we could create a bracket optimizer, then build it.
How It Works
Pretty simple. For each possible pick, multiply the team’s actual probability of advancing by how contrarian the pick is (1 minus the public pick percentage). Weight by how many points each round is worth (the championship is worth 32x a first round game). Then add a pool-size adjustment for the champion pick, because in a 75-person pool you need a champion fewer people have. Someone else with the chalk pick will also get most other games right.
It scraped both data sources, merged them, ran a constrained sampler (generate thousands of brackets, filter by structural rules, pick the highest-scoring one), and output optimized brackets. It also applied guardrails from Evan Miya’s bracket strategy: roughly 2 one-seeds in the Final Four, at least one 12-over-5 upset, at least one 11-seed win, and nothing with less than a 10% probability of happening.
For the men’s bracket, this worked pretty well right away. Michigan as champion (highest model probability but only 14% of the public picking them), sensible value plays like VCU over North Carolina and Texas over BYU. No crazy upsets. Reasonable.
Then the Women’s Bracket
Evan Miya doesn’t do women’s basketball, so Claude needed another probability source. It grabbed Massey Ratings (publicly available team strength numbers) and built a Monte Carlo simulator to generate round-by-round probabilities. Fine in theory. The problem: it needed to convert Massey’s rating scale into actual win probabilities, and it just made up a number. A logistic function with scale=1.5, presented with the same confidence as the parts it actually researched. No calibration against historical data, no sanity check. Just a best guess that happened to produce numbers that looked plausible.
I caught this and we did proper calibration: looked up the standard deviation of game margins from published research (~11.5 points), fitted a multiplier against historical upset rates (1-seeds beat 16-seeds 99.2% of the time in women’s), and got a well-calibrated model. Good example of AI doing useful research legwork once you actually tell it what to look for.
But then the optimizer started picking 16-seeds to beat 1-seeds.
Two different 16-seeds winning in the first round of the same bracket. The leverage formula was mathematically doing exactly what we asked: UConn has a 94% win probability but 98% of the public picks them, so their “leverage” is near zero. Meanwhile UTSA has a 6% probability but only 2% of the public picks them, so their leverage is actually higher. The formula was rewarding “nobody picked them” without weighing “because it’s a 6% event.”
The system was totally confident about it too. Obviously insane to anyone who watches basketball. The fix: the men’s bracket already had guardrails from Evan Miya’s strategy principles, including a rule that you never pick anything with less than 10% probability of happening. The women’s bracket didn’t have those guardrails applied. Once we added them, the insane upsets disappeared and the brackets snapped back to sensible. But the system never would have flagged the problem on its own. It was happy with the output.
The Point
AI does the grunt work great. Scraping websites, merging data sources, building the optimizer, running simulations. I didn’t write a line of code for any of this and the implementation was solid.
But it’ll happily fill in gaps you didn’t ask it to fill, with the same confidence it uses for things it actually knows. The scale=1.5 was invented. The leverage formula worked on paper but produced insane results at the extremes. Without the guardrails, the output was nonsensical.
Same pattern as always. AI does the implementation, you do the thinking. You still have to look at the output and ask whether it makes sense.
The picks:
Men’s: Michigan (1) over Houston (2). Final Four: Duke, Houston, Purdue, Michigan.
Women’s: Texas (1) over South Carolina (1). Final Four: UConn, South Carolina, LSU, Texas.
Don’t @ me.