Saturday, February 28, 2026

34 Years in the Panopticon: Specifications, Formal Methods, and the AI Inflection Point


Monday 2/24/2026 marks 34 years since I started working full-time post-undergrad (2/24/1992). Another lap around the sun in what I've come to think of as the long game of systems software. Last year's post https://vzimmer.blogspot.com/2025/03/33-years-in-panopticon-books-et-al.html touched on the "prisoner missing his cell" metaphor—retirement that didn't quite stick. A year later, I'm increasingly convinced we're at an inflection point where the nature of the cell itself is changing. Not just in work practices, but in what "work" fundamentally means in software engineering.

The Specification Bottleneck I've been reading through research on https://latentintent.substack.com/p/model-intelligence-is-no-longer-the, and it crystallized something I've been observing: **AI models can now match IMO gold medalists at formal mathematics but struggle with basic enterprise CRUD apps**. The paradox isn't about intelligence—it's about specification completeness. When we write code, we're performing lossy compression of intent. The "why" gets lost in the "what." In a human-to-human handoff, we reconstruct intent through context, documentation, and institutional knowledge. But when AI enters the loop, that lossy compression becomes the bottleneck. Consider the analogy: **Prompting an AI and keeping only the generated code is like compiling source code and keeping only the binary.** Sure, it works—until you need to modify it, debug it, or extend it. Formal Methods Go Mainstream (Maybe?) The fascinating development is that formal specification is becoming tractable at scale. DeepSeek-Prover-V2 https://github.com/deepseek-ai/DeepSeek-Prover-V2 demonstrates AI decomposing complex theorems into subgoals and synthesizing complete formal proofs in Lean 4. This bridges informal reasoning (the "why") with formal verification (the "what"). I've spent a lot of time recently writing lean files to help prove properties of well-beloved specs and code. I keep thinking about Gödel's Loophole https://en.wikipedia.org/wiki/G%C3%B6del%27s_Loophole—when Kurt Gödel prepared for his U.S. citizenship exam in 1947, he discovered what he believed was a logical inconsistency in the Constitution that could legally transform it into a dictatorship. The exact loophole was never published (Einstein and Oskar Morgenstern deflected the conversation during his citizenship interview), but the incident illustrates something profound: **Even the most carefully crafted specifications contain contradictions that aren't apparent until subjected to formal analysis.** This is precisely what we need for specifications: - Intent conflict detection (like Gödel found in the Constitution) - Ambiguity highlighting (what happens in this edge case?) - Property-based examples (concrete test cases from abstract properties) - Traceability (why was this constraint added?) "Specs as Code" and Model Specifications There's a convergence happening. https://model-spec.openai.com/2025-04-11.html represents an attempt to precisely define intended AI model behavior through structured specification rather than ad-hoc training. The spec includes a "chain of command" for resolving conflicting objectives (helpfulness vs. safety vs. legal compliance) and aims to be machine-checkable. This mirrors the broader shift: from implicit behavior to explicit intent. The https://arxiv.org/abs/2310.16218 research explores how to update LLMs with new knowledge without catastrophic forgetting. The parallel to specifications is clear—we need specifications that evolve without losing historical context. Version control for *intent*, not just code. The New IDE If specs become primary artifacts, we need different tools. Current IDEs help with: - Syntax highlighting and completion - Type checking - Refactoring and navigation - Debugging execution The **Specification IDE** needs to help with: - Intent documentation (capturing the "why") - Ambiguity detection (underspecified requirements) - Consistency checking (finding contradictions, à la Gödel) - Example generation (property-based tests) - Traceability (linking intent to generated code) - Intent diff (showing how specifications evolved, not just code changes) Imagine a git diff that shows: ``` - System should handle "reasonable" load + System should handle 10,000 req/sec with p99 latency < 100ms ``` That's far more meaningful than 500 lines of changed code. The Constitution Problem in Software The Constitution analogy runs deeper. Constitutional amendments show intent evolution—the 14th Amendment doesn't delete the 3/5 Compromise clause from the original document; it adds context that overrides it. Historical decisions are preserved with their rationale. Software specifications need this same property: new requirements added with rationale, contradictions flagged, historical decisions preserved. We don't just want the latest version; we want the *evolution* of intent with full context. This is particularly relevant for firmware and platform security—domains where specifications have historically been underspecified, leading to vulnerabilities when implementations made different assumptions.

Democratizing Software Development

Here's where it gets interesting: **when the spec is the code, domain experts become first-class contributors**. Today, contributing requires understanding programming languages, frameworks, design patterns, and deployment pipelines. But specifying intent? Domain experts already do that. A medical professional can specify record requirements. An accountant can formalize tax calculations. A compliance officer can define regulatory constraints. With AI handling spec-to-code translation, the programmer's role shifts from writing code to: - Architecting systems - Designing specification languages - Building verification pipelines - Reviewing generated implementations The Year Ahead At 34 years in, I find myself more interested in the meta-problems than the object-level problems. Not "how do we implement secure boot?" but "how do we specify secure boot such that implementations can be verified and evolved?" The companies and teams that figure out how to capture, maintain, and evolve specifications will have a massive advantage as AI code generation commoditizes implementation. The bottleneck isn't "can we build it?" anymore. It's "can we specify what we want clearly enough?" That's a fundamentally different problem—and it requires fundamentally different tools, skills, and mindsets. --- Random Observations **On AI and "Specs as Code"**: Found an interesting talk reference on https://www.youtube.com/watch?v=8rABwKRsec4 that I need to dig into more. The idea that specifications should be executable, testable, and versioned isn't new—but AI makes it practical.









**On OpenAI's Model Spec**: https://model-spec.openai.com/2025-04-11.html dedicates itself to the public domain with CC0 1.0. That's a statement—they want the specification approach to proliferate, not the implementation details.

**On Formal Methods**: DeepSeek-Prover-V2 using Lean 4 is significant. Lean has always been the "we'll use this in 20 years" formal system. Maybe that timeline just compressed.

**On Work Longevity**: Someone mentioned that if you're still working in tech in your 50s, you're "doing it wrong." I prefer to think I'm doing it *differently*—focusing on problems that compound over time rather than chasing the next framework.

**On Retirement**: "RETIREMENT FAIL!" still makes me laugh. But I'm not sure I failed at retirement as much as retirement failed to provide the intellectual leverage I was looking for. There's something about working on problems that outlive you that's deeply satisfying.

** On applying formal tools to firmware: I dabbled a bit with an innocuous aspect of firmware and formal tooling https://github.com/vincentjzimmer/Documents/blob/master/differential-formal-verification-uefi-dxe.pdf as a UW grad student on leave from AI program..... I need to polish up some of the lean4 work that might be of interest, too, along with rocq, hol, etc. This study was sort of an homage to hard time in PI cores at job.last and job.last.last I suppose.


---

The specification is the new code. The question is: are we ready to think differently about what "writing software" means?

On to year 35.