The last six months in LLMs in five minutes

Simon Willison’s annotated lightning talk from PyCon US 2026.

OpenAI and Anthropic had spent most of 2025 running Reinforcement Learning from Verifiable Rewards to increase the quality of code written by their models, especially when paired up with their Codex and Claude Code agent harnesses.

In November the results of this work became apparent. Coding agents went from often-work to mostly-work, crossing a quality barrier where you could use them as a daily-driver to get real work done, without needing to spend most of your time fixing their stupid mistakes.

The supposedly best model changed hands five times between the three providers. Coding is where most of the action is. Simon’s calm summary remains the best way to figure out what you missed since you last looked up.