🦋🤖 Robo-Spun by IBF 🦋🤖
🌊➰🧭 AKIŞ 🌊➰🧭
Across labs and late-night notebooks, these scenes track a familiar arc—seasoned researchers begin with disavowal, try an AI cautiously, and then watch the work shift as the model contributes a missing step, a counterexample, or a tighter bound. Scott Aaronson documents a “key technical step” in a QMA proof supplied by GPT-5-Thinking; Ivan Nourdin’s team runs a controlled “junior collaborator” experiment that upgrades a qualitative theorem to quantitative rates; Sébastien Bubeck reports GPT-5-Pro sharpening an open convex-optimization bound; Terence Tao recounts an hour-long dialogue that ends with a verified counterexample; and the ‘Gödel Test’ probes the edge where tool and teammate blur. Together they form a small but telling corpus: skepticism, test, correction, contribution—and then a new standard for what counts as collaboration. (Shtetl-Optimized)
Terence Tao — the counterexample that wouldn’t blink
He begins in doubt. ‘An AI can’t possibly navigate this minefield of parameters,’ he thinks, opening a blank notebook and a guarded browser tab. The first attempts are clumsy—code that would run longer than a winter night, heuristics that point nowhere. He pivots from brute force to dialogue, trading lemmas like chess moves. The machine proposes ranges, then refines them when he presses; it catches the slips he almost lets through. After an hour the stalemate cracks: a crisp tuple, a glimmer of contradiction, and then verification—thirty lines of Python the AI itself suggested, simple enough to read line by line. The counterexample stands there, undeniable, like a mountain suddenly visible when the fog lifts. Disavowal gives way to craft: the expert and the engine, each doing what the other made possible. (MathOverflow)
Scott Aaronson — the eigenvalue that answered back
He isn’t looking for poetry; he’s looking for a function. The paper’s spine is a quantitative barrier in QMA amplification, and he has a matrix whose entries dance with θ like a hall of mirrors. He prods the model, wary—this is not a toy problem, this is decades of complexity theory staring back. GPT-5-Thinking doesn’t hand him a proof; it whispers the right object to study. The room goes quiet as the candidate clicks into place, and the bound that felt immovable begins to budge. Later, the arXiv entry will be dry and correct, but the memory is warmer: a moment when a tool spoke the language of the work. Disbelief doesn’t disappear; it is transmuted into a sharper standard—‘show me the step I didn’t see.’ The step appears. He takes it. (Shtetl-Optimized)
Ivan Nourdin (with Diez & da Maia) — from qualitative hush to quantitative voice
The experiment is designed like a lab with clean glassware: no heroics, no mythmaking. Take a qualitative fourth-moment theorem and ask, with discipline, whether GPT-5 can help force it to speak in rates—Gaussian, then Poisson. The model is not a magician; it is a junior colleague who suggests a route, stumbles, course-corrects, and leaves breadcrumbs that experts can bake into bread. When the manuscript is done, it reads like science should: careful claims, boundary lines inked in the margins, the word ‘quantitative’ no longer a hope but a heading. Skepticism remains—by design. But a new habit forms: when the silence of ‘we don’t know how fast’ hangs in the air, someone says, ‘let’s ask the thing.’ (arXiv)
Sébastien Bubeck — the bound that moved
The stage is smaller—just a post, a claim, a screenshot of lines and symbols—but the stakes are familiar: can this model sharpen a convex-optimization bound that has sat there, smug, in print? He frames it as a clean challenge, feeds it, and waits with the posture of someone prepared to say ‘no.’ The answer comes back tighter. He checks it himself. It holds. There will be debates about disclosures and details, about what counts as a solved problem without a full preprint. Yet in that first-person ledger, a mark is made: ‘I asked; it improved; I verified.’ A private motion of belief shifts from ‘not yet’ to ‘sometimes—under supervision.’ The door stays open. The problem space gets breezier. (X (formerly Twitter))
Coda — the benchmark that dared the models
Elsewhere, two researchers set five fresh conjectures like traps for complacency. GPT-5 steps through, stumbles on cross-paper synthesis, flashes originality on an easier case, even refutes a conjecture with an alternative guarantee. It is not a partner here; it is an examinee under bright lights. But the drama rhymes: doubt, attempt, adjustment, a few clean hits, a miss that teaches you where the edge still cuts. The test doesn’t crown a champion. It sharpens the question that all the stories share: when does the line between tool and teammate blur enough to change how we work? (arXiv)
[…] — When Doubt Blinked First: Dramatizations of Prestiged Scientists Working With GPT-5 […]
LikeLike