How the review bar changes
Raise the bar. The bar slipped. This passes our bar. The phrase has been doing a lot of work in engineering organizations for a long time. It was always shorthand. The thing it was shorthand for was a bundle of conditions a diff had to meet to merge: stylistic conformity, structural fit, intent legibility, blast-radius awareness, test discipline. The bundle had no written rubric. It was held in place by social mechanics everyone understood without having to state — the author would defend the choice, the reviewer would push back where the defense thinned, the rest of the team would notice if a strand quietly slipped.
AI authorship did not lower the bar. It pulled the bundle apart. Some strands of what used to be “the bar” are now produced by default and read as evidence of nothing. Some strands that were always under-checked have become non-survivable to under-check. New strands appeared that no review process had a slot for. The team that still treats the bar as a single thing that can be raised or lowered is now negotiating with three or four different bars, in different directions, while continuing to talk about one.
The bar did not slip. The bar became unbundled.
1. The bar was always a bundle
Before AI authorship, the review bar enforced roughly five strands, with a sixth that was assumed rather than written:
- Stylistic conformity. Does this look like code our team writes — capitalization, layout, naming, import order, error-handling idioms. Easy to check, often weighted more than it deserved.
- Structural fit. Does this belong here. Does the diff respect the layering, the module boundaries, the existing separation of concerns. Harder to check, often weighted less than it deserved.
- Intent legibility. Can a future reader figure out what this change was supposed to do. Comments matter less than naming; PR descriptions matter more than either.
- Blast-radius awareness. Does the author know what this touches. Does the diff reach further than the description implies.
- Test discipline. Are there tests; do they exercise the change rather than its mocks; does the test suite still mean something after this lands.
And the assumed sixth: the author’s capacity to defend. The author had thought about the change long enough to answer why this and not the alternative. When a reviewer flagged a strand, the conversation that followed enforced the bar more reliably than any of the strands themselves. The bar was not, in practice, a height. It was a Socratic friction. The strands were the agenda items the friction could land on; the author’s stake was the thing the friction was working against. A diff that survived the friction had been bargained up to the bar.
This is the model “raise the bar” referred to. None of the five strands had a written threshold; all five were enforced through the implicit conversation, anchored by the sixth.
2. Where the bundle decoupled
AI authorship did not affect each strand equally.
Stylistic conformity inflated and became evidence-of-nothing. AI-authored output is stylistically appropriate by default. The reviewer who opens a diff and sees clean, idiomatic code is now seeing a property that used to be a costly signal of care and is now a free byproduct of the training distribution. The strand still passes the bar; it has stopped meaning anything when it passes. A reviewer using stylistic cleanliness as a partial signal of thoughtful work is, under the new conditions, accepting decoration as evidence.
Structural fit degraded across sessions. A single human author holds a structural model of the module across edits, and the strand was, in practice, enforced by that continuity. AI-authored diffs accept local preferences one session at a time. The same engineer, accepting two diffs in the same module two days apart, lands two slightly different structural choices, and neither diff individually trips the bar. The module ends up a patchwork; each diff is fine. The strand reports as passing at the diff level and failing at the module level, which is a kind of decoupling no review checklist was designed to catch.
Intent legibility collapsed at the source. The PR description used to summarize the reasoning the author had already done. Under AI authorship, the description summarizes what was accepted, which is not the same artifact. A diff merges with a description that reads coherent and contains no recoverable intent. The strand reports as passing — there is, after all, a description — and six months later the codebase has a layer of changes whose reasons exist nowhere. The team learns this when an incident asks the codebase a why question and the codebase has no answer to give.
Blast-radius awareness ceased to track author awareness. The old strand was enforced by the author’s knowledge of what their change touched, gained from the slow act of writing it. The new diff’s reach is whatever the model decided; the author’s awareness of that reach is, at best, partial. The reviewer’s instinct to ask did you check this is local? now lands on a different person than the author of the code; the answer “yes, it’s a small diff” is no longer responsive, because the diff’s size is no longer a proxy for the diff’s reach.
Test discipline got harder to read. AI-authored tests are often present, syntactically correct, and exercising something other than what they appear to. A test that mocks the dependency it is supposed to exercise looks identical to a test that exercises a real dependency, on the surface of the diff. The strand still flags missing tests. It stopped flagging hollow ones.
And the assumed sixth — author capacity to defend — dropped to near-zero on the accepted-without-chosen subset of work. The answer to why this and not the alternative is, at best, the model produced this. Sometimes that is honest. It is never a defense in the sense the bar relied on. The mechanism that used to enforce the other five strands has removed itself from the room, and the strands it was enforcing are now expected to enforce themselves.
A team reading these decouplings as the bar slipped has named the symptom and missed the structure. The bar did not slip uniformly. Each strand moved in a different direction, for a different reason, and the bundling that let the team manage all of them with a single phrase has come apart.
3. What the new bar has to check
The strands the old bar enforced implicitly now have to be explicit, and the rubric has to extend. Three additions matter under AI authorship that did not have to be named before.
- Provenance. A diff that was chosen and a diff that was accepted read identical on the surface and behave differently in every subsequent encounter — debugging, refactoring, post-mortem, onboarding. The new bar has to distinguish them. The cheap version of this is a structured field in the PR template that asks the author to state, in one sentence, the choice they made and the reason they made it. Accepted as-is is a valid answer. It is also a marker the codebase needs in order to know what kind of artifact it is being asked to reason about later.
- Cross-file invariants. This strand was always in the bar and always under-checked. Under AI authorship the under-checking is not survivable, because the modal failure of AI-authored diffs is locally correct, globally wrong. The new bar has to demand, in the description, an explicit statement of which invariants outside the file the diff respects — not as bureaucracy, but as a forcing function on the author to have actually looked. A diff that the author cannot articulate the global constraints of is a diff the author has not reviewed; merging it is not review, it is reception.
- Reversibility. Can this be undone without re-running the same flawed inference that produced it? Or has the diff embedded a structural choice that the next three diffs will assume, locking the team into a direction nobody decided to commit to? Reversibility used to be implicit in the small-diff discipline; AI-authored work tends toward larger landings, and the proxy diff size used to be is no longer doing the work. The bar has to read reversibility directly.
And one strand the bar must demote rather than keep: stylistic conformity has to be moved out of the evidence base for thoughtful work. It still belongs in linting, where it costs nothing and catches the obvious. It no longer belongs in review, where its presence will mislead the reviewer into thinking a signal has been read. A diff that looks idiomatic is, on its own, neither a positive nor a negative signal about the quality of the change. Treating it as positive is the most common way the new bar quietly lowers itself, one merge at a time, while the team continues to feel that standards are being upheld.
The strands together no longer form a single height a memo can raise. The bar is now a small, explicit checklist, with an author-defended rationale section, in a tooling layer the team owns. That sounds heavyweight. It is the price of the old social mechanism that used to do the same work for free.
4. What “raising the bar” now means
The phrase has not gone away. It will keep getting used in retros, in planning meetings, in the senior-engineer pep talk about how the codebase is sliding. It is operationally useless without three prior questions answered.
- Which strand? Not the bundle. The specific strand. Stylistic conformity does not need raising — it needs demoting. Intent legibility might need a structured rationale field. Provenance might need to exist at all.
- Enforced by what evidence? Since the old evidence base decoupled, the strand-specific rubric has to be re-grounded in something the reviewer can read off the diff or the description. The PR template, the lint rules, the structured fields are the surface the new bar lives on; the old surface — the author’s tacit defense — is no longer reliably present.
- Defended by whom? Since the author is no longer the keeper of the rationale by default, the team has to decide who owns the strand on a given diff — the author when they actually chose, a designated module owner when the author accepted, an architect-on-call for the cross-file invariants strand on diffs that touch shared contracts — and write that ownership down.
A team that has answered these three questions for each strand has a working review bar under AI authorship. A team that has not is still asking how do we tighten review, which is the right intention pointed at the wrong abstraction. Tightening a bundle whose strands moved in different directions is, depending on the strand, redundant, futile, or actively misdirected.
Closing
“The review bar” pictures a single line a diff either clears or does not, enforced by a community of people who roughly agree on where the line is and keep each other honest about it. AI authorship retired most of the social conditions that made the shorthand work. The line is now a constellation; the agreement is uneven; the enforcement mechanism has to be rebuilt one strand at a time, with the rebuild visible in tooling and templates rather than carried in shared understanding.
None of the strands are exotic. Provenance, structural coherence, intent, blast radius, reversibility — these are the criteria thoughtful reviewers were already applying, when they had the time and the author had the answers. The work is to make them explicit, decide which the team is now responsible for, and stop using “the bar” in the singular. The teams that do this will keep merging code they understand. The teams that do not will keep merging code that looks like everything they have always merged, and slowly discover that the bar was never the line. It was the conversation, and the conversation has a different shape now.