Performance Analysis of Repeated LLM Attempts at a Research-Level Mathematics Problem

Bartosz Naskręcki (co-generated with Claude Code)

This report analyses eleven independent attempts by a large language model to solve a research-level problem in arithmetic algebraic geometry, contributed as a FrontierMath Tier 4 problem. Only 1 of 11 attempts produced the correct answer, yet collectively the attempts cover the majority of the solution space — a striking illustration of the last-mile problem in AI mathematical reasoning.