AI Solves Near-Impossible PhD-Level Math Problems in Landmark Testing
Mixed

AI Solves Near-Impossible PhD-Level Math Problems in Landmark Testing

The mathematical problem-solving abilities of Artificial Intelligences have recently seen significant improvements. According to a statement released by Ruhr University Bochum on Tuesday, AI models successfully answered almost all questions in a test featuring 100 high-level mathematical exercises, with only two of the posed problems remaining completely unsolved.

Forty-nine international mathematicians developed these questions during a workshop at the Max Planck Institute for Mathematics in Leipzig. The complexity of the problems was at least equivalent to PhD-level coursework. Crucially, the required answers had to be unique and known to researchers, but they could not have appeared explicitly in any published literature.

The process involved several stages of iteration. Initially, the five current Large Language Models (LLMs) were tested once, and 41 of the tasks remained unsolved. The top three performing models from this first round were then confronted with the same set of questions 20 additional times. During these repeated attempts, a wide variation in the models’ responses was observed between different runs, reducing the number of unsolved questions to 16. Finally, researchers presented the questions three consecutive times to two specialized “heavy-thinking” models. These models managed to solve another 14 exercises, resulting in only two problems remaining completely unsolved in the end.