What the interview actually tests now

Most hiring conversations about AI focus on a single question: did the candidate use AI to produce what they showed us, and how do we tell? It is the wrong question, and asking it is itself a signal that the interview hasn’t been retuned to what the work actually became.

The interview has always been an instrument calibrated to a specific theory of competence. AI tools didn’t break the instrument — they shifted what each part of it measures. Some signals still mean what they used to. Some quietly went to zero. A few inverted, which is the worst case, because the team running the loop scores them positively and only finds out at month four that the heuristic flipped.

Three categories, then a question worth replacing.

1. Preserved — signals that still mean what they used to

These are the signals anchored to judgment, not to production speed. The tool can’t fake them, doesn’t compete for them, and doesn’t depreciate them. Lean on them harder than you used to.

How a candidate talks about old code. Not the code on a screen — code from a previous project, described in a sentence. Why this and not that, what they would do differently, where the design got it wrong. Years of taste compress into ten minutes of talking. The tool has no taste; it has a distribution. This signal got more valuable, not less.
Debugging under uncertainty. Hand them a system they don’t know, a bug they don’t have a search query for, and watch how they form hypotheses. The shape of how someone gets unstuck is the shape of their seniority, and AI does not change it. If anything, it raises the stakes — engineers who can’t debug what the model produced for them end up debugging more, not less.
Disagreement with their own choices. Ask a candidate to argue the case against the design they just defended. The good ones do it without hesitation; they have been doing it inside their own heads for years. The weaker ones can’t separate themselves from their answers. AI-coached prep can polish a position; it cannot manufacture the reflex of holding one’s own work at arm’s length.
System thinking. Where does this belong, what does it touch, which invariant does it assume. Same signal, more important than ever, because AI-generated code is structurally biased toward locally-correct and globally-wrong. A candidate who thinks systemically catches what the tool can’t.

These signals were always the senior signals. They are now the signals, full stop.

2. Nullified — signals that quietly went to zero

These were the signals tied to production fluency — the parts of competence that were expensive to acquire and visible in the output. AI tools made the output cheap to produce without making the underlying competence cheap to acquire. The signal and the skill came apart, and the signal is what the interview was reading.

A clean implementation of X in twenty-five minutes. The classic live-coding artifact. It used to demonstrate fluency, recall, and disciplined construction. It now demonstrates that the candidate had a network connection. Even with AI use officially disallowed, candidates have absorbed enough patterns from AI-paired practice that the same artifact emerges from less competence than it used to require.
Polished take-home repos. The take-home was always a noisy signal — it measured time, environment, and motivation alongside skill — and AI made the noise dominant. A clean repo with thoughtful commits and a thorough README now tells you the candidate has a working AI loop and an afternoon. Both useful. Neither what the interview was trying to learn.
Framework and library trivia. Memorized syntax, command flags, the obscure config option. Cheap to look up, cheap to generate, cheap to fake. Selecting on this signal was always a proxy for “spent time in this stack,” and the proxy stopped working.
Fluent prose in cover letters and write-ups. Polished writing was a signal of care, communication ability, and willingness to invest in the application. It is now a signal of using a model. This doesn’t mean writing skill stopped mattering — it means the cover letter stopped reading it.

A nullified signal is harder to spot than a degraded one because the format keeps producing outputs. The instrument still moves. The needle is just measuring the room temperature instead of the patient’s pulse.

3. Inverted — signals the loop now scores backward

These are the dangerous ones — the heuristics that used to mean good and now mean concerning, or vice versa, with the interview loop still scoring them on the old polarity.

Speed. Used to be a positive: “ripped through it in twenty minutes” was strong. Under AI loop, speed without visible deliberation is a yellow flag. The candidate who paused, restated the problem, asked a clarifying question, and produced a slower but considered solution is now the stronger signal. The fast-and-clean candidate may be doing the work; they may be doing the dispatch. Telling the two apart is the new live-coding skill, and most interviewers haven’t picked it up.
“I don’t use AI tools.” Used to read as principled or rigorous. Now it reads as someone who has not yet adapted to how the work gets done. Not a hard veto — strong judgment under any toolchain still beats weak judgment with the best tools — but a question worth asking explicitly: why not? The answer separates the engineer who has tried, evaluated, and rejected (rare, defensible, sometimes correct) from the engineer who has refused the question (common, increasingly costly).
“I write everything by hand for production.” Used to be a virtue. Now it is a stance, and stances cost the team. The team’s effective output curve has shifted; an engineer who refuses to participate in the new toolchain is participating in the old one alone, which means they are not participating in the team’s actual workflow. Worth probing: does this mean they think more carefully, or does it mean they have refused to engage with the change?
Confidence in the answer. Pre-AI, a confident answer paired with correct work was a strong signal of competence. Post-AI, confident-and-correct is the most common shape of a candidate who outsourced the answer and rehearsed the defense. The signal that now correlates with deep competence is calibrated uncertainty — the candidate who says “I think this is right but I would want to verify by trying X” before you ask. The instrument has to read uncertainty as competence in a way it didn’t have to before.

Catching inversions is the highest-leverage retune. A nullified signal at worst makes the interview useless. An inverted signal makes it actively wrong.

4. From authorship to ownership — and the test that gets you there

“Did they write this or did the model?” is the question every hiring manager asks. It is the wrong question, for three reasons.

First, the answer is almost always both, in proportions the candidate could not honestly report even if they wanted to. The work happened in a loop; inputs and outputs interleaved; the line between “I wrote” and “I accepted” is not where the question assumes it is.

Second, even when the answer is clear, it is not the right axis. A candidate who used AI to produce a take-home, then deeply understands every line, can defend every choice, can extend the work in real time, and can rewrite the design under pressure has demonstrated more than a candidate who hand-wrote a take-home and cannot articulate why they made the choices they made. The pen is not the signal. The understanding is.

Third, asking the question is itself a tell about the interview loop. A team focused on authorship is a team that hasn’t yet retuned its rubric to ownership. The two are different. Authorship is who typed; ownership is who can defend, modify, and rebuild. Authorship is unverifiable and increasingly meaningless. Ownership is verifiable in five minutes of follow-up and is what the team is actually hiring for.

The better question, asked of any artifact a candidate produced — code, write-up, design — is can you change this on the fly. Take their solution, propose a constraint they didn’t have (“now the input is unbounded,” “now this has to run on a single thread,” “now the database is read-only for the next ten seconds”), and watch them work the modification live. Authorship cannot be faked under that pressure. Neither can ownership.

The operational form of this question is what is worth adding to the loop: the disagreement signal. Place the candidate in a position where they have to disagree with AI output and defend the disagreement. Hand them a small piece of plausible-looking code that has a bug — ideally one of the recognizable shapes of confident wrongness: locally correct and globally wrong, stale pattern, plausible API, invariant violation. Tell them this is what came out of an AI tool. Ask them whether to ship it.

What they say next is, in order of how much it tells you:

The strongest candidates spot the failure, name it, articulate why, and propose what to ship instead. They are confident-but-not-defensive about disagreeing with the tool.
The next tier sense something is off but cannot localize it, then either work toward the issue under pressure (good signal) or get stuck in a way that reveals the limit of their evaluation skill (also a signal — calibrate to seniority).
The weak signal: candidate skims, says “looks fine,” recommends shipping. This is not a question of taste; it is a question of whether the candidate can review at all under the new conditions. Most of the work is now this kind of review.

The disagreement signal is hard to fake, hard to coach for, and aligned with the modal failure mode of AI-augmented work. Add it to the loop. Replace one of the nullified signals with it.

Closing

If your interview loop hasn’t been retuned, you are hiring against a model of competence the work no longer rewards. Some of your best candidates are getting filtered out for reasons that no longer apply. Some of your worst hires made it through because the inverted signals scored them well. Neither problem fixes itself with stricter rules around AI use; both ease once the rubric is updated to read for ownership and calibrated uncertainty rather than authorship and confident production.

The retune is mundane in shape and uncomfortable in practice. The work happens in calibration meetings — stop weighting X, start asking Y — with the same rubric writers who set the previous version. None of it is technical. All of it is institutional. Teams that go through the conversation come out hiring engineers who can do the work the job has actually become. Teams that skip it find out the cost a few months later, in a debugging session where the new hire cannot evaluate output the rest of the team can no longer afford to re-review on their behalf.