Terence Tao - Machine-Assisted Proofs (February 19, 2025)

yeah so I’ll be speaking um about how machines are transforming the way we do mathematics. So you know this this is time of great change I think well in general for not for everything really, but also mathematics. Mathematics, we’re very traditionalist um uh discipline. I I think you know like we still use blackboards um you I I four blackboards in my own office um you know um other Sciences they’ve embraced sort of big science where they have you know 50, 500, 5,000 collaborators. I think mathematicians have somewhat reluctantly moved from like one to three or five in collaborations um you know so we still do things in many ways like we’ve done for centuries um you know working on individual problems, uh advising individual students uh and um but it’s it’s changing uh yeah so the um we are using machines now in particular in many interesting ways um not yet at the point where there’s sort of a revolution. There’s not yet a killer app. It’s kind of like uh if the internet had been invented but not yet email. Email was the first killer app for the internet that really triggered mass adoption uh and we’re not quite there but we it’s you can sort of see it coming.

Um right so um we talking about how machines have been influencing mathematics. I know many of you are not mathematicians, are we talking about some examples of research mathematics but at a very high level you know there be no actual equations or anything uh for in well actually you’ll see some but but not that won’t be the main focus it’s the kind of artistic background in some sense um so uh in some sense we’ve been using machines longer than anyone else. Okay so we’ve been using machines to assist mathematics for millennia. Right here is a machine assisting a mathematician from the first Cy um you know so uh but what’s the difference? It’s the scale of just how much machines can help us but also the nature uh come how machines are helping us.

Um so um right so what are we what have we been using um uh machines for uh so historically um we for centuries okay so the abacus is already an example of computation but besides computation just building tables um you know so you know in the early Middle Ages people started building trigonometric tables and logarithm tables and this has made certain computations much much uh much much easier to do. Um and of course these tables are are completely obsolete now because we have calculators and then computers and and and so forth uh but we still um uh use we still rely very much on lots of tables we just call them databases now.

Um but um there has long been um a tradition of experimental mathematics. It’s very much smaller and honestly not as well respected as it should be compared to theoretical mathematics uh which is like 99% of mathematics, so it’s very imbalanced. Um I do feel like in the future we’re going to have a much more um balance between theory and experiment um closer to how it is in the other sciences um but it has a history uh so for example in the 18th century uh Lande and Gauss famously investigated uh prime numbers uh using by so Gauss in particular who was himself a kind of human computer in many ways um that built tables of like the first 100,000 prime numbers um and they conjectured uh the prime number theorem which was then proven a couple centuries later um but uh but it was but the experiments came first.

Um and in a very similar vein um one of the um major open questions in number theory nowadays is something called the Poincaré and Deconjecture. It’s a conjecture about elliptic curves and I’m not going to talk about what it is but again it was first discovered by uh compiling lots and lots of tables of elliptic curves Computing various statistics of interest and noticing um experimentally uh a very strong relationship between two otherwise unrelated objects. We still don’t know why they’re related but we strongly believe they are um and this led to this conjecture um perhaps the biggest and most successful mathematical database. It’s called the Online Encyclopedia of Integer Sequences.

Uh this is something that many mathematicians use in some fields almost on a daily basis um so um many um I mean um the thing is that the literature of mathematics is huge and if you discover that you’re working with a mathematical object, if you’re lucky you know the name of it and you can look it up on Wikipedia or maybe you know an expert in the right area and you can ask them what’s known about this um this object and so forth uh but often there are um two mathematicians across the world who have worked on the same object um but they don’t they don’t realize it because they call it different names. Uh they have no uh no way of comparing or searching even has anyone else dealt with this funny thing um but many objects in the world in mathematics come with um integer sequences.

So um like if you’re studying I don’t know um Platonic solids for example there were five Platonic solids and you know there’s a tetrahedron with four vertices, there’s a cube with um with eight vertices and so forth. And so um there’s a certain sequence of numbers that you can attach to many mathematical objects and New Sloan had this genius idea that why don’t we just actually make a database of all the integer sequences that have ever come up in mathematical problems somewhere or at least as complete a database as you can hope for um and so this has now become the online encyclopedia in sequences hundreds and thousands of um sequences.

And every day you know a mathematician who’s trying to understand an object you know you can you can uh try to generate an integer sequence from this object, look it up in the database and very often um it or something very similar has already shown up um in the database and you can make connections that would otherwise not have been discovered. Okay so uh that’s one classical use of uh of machines okay or data tables. Um The other major use is uh what we call scientific computation or um more colloquially number crunching.

Uh and so this this is sort of the uh this is what you would think of when you say we use computers to do mathematics. You’re just doing some massive simulation or just solving a whole massive amount of equations. You know so you want to model a dynamical system or uh or find roots of a polynomial and so forth you know I mean you know this is scientific computation. Um and you know so it’s been around for over 100 years. It predates electronic computers.

Uh so um I mean um arguably the first major scientific computation was was done by Hendrick Laurence in the 1920s using a team of human computers to model um fluid flow for a dam that was being constructed in the Netherlands. Um had to invent floating point arithmetic actually uh to to to uh to run run the simulation okay and the dam was successfully built and predictions held up. Okay so those are the traditional um oh at um so it and these number crunch things you know um so of course they do lots of arithmetic you know lots of floating point um operations and so forth um but they can also do certain sort of logic um uh logical tasks.

Um so like if if you have a whole bunch of statements and and some of are true and false either true or false and you know some relationships between them you can try to work out you know kind of like solving Sudoku or logic puzzles which ones are true which are false. Uh there’s a class of problems called satisfiability problems for which we have um um automated tools to solve these things so um certain math problems can be kind of uh reduced to what’s called satisfiability where there’s just a certain number of logic puzzles where there’s a certain number of statements that are true or false and a certain number of hypotheses and you can get conclusions about hypotheses uh and some of that you can automate um as long as it’s not too large.

Um now a lot of mathematics you can’t um solve by this method because it only works if you have a finite number of hypotheses and finite conclusions but sometimes it works. Uh let me just give you one example of uh a massive um satisfiability problem okay so uh there’s this long-standing conjecture called the Boolean Pythagorean triples problem which was only solved by a massive computer search. Um so um the question is that um so these Pythagorean triples like three, four and five which are the sides of a right-angle triangle um and uh but they’re fairly sparse, there’s not that many of them. Five, three, four, and then there’s five, twelve, thirteen and so they’re they’re fairly rare uh but it was believed uh and it’s now been proven that if you take the natural numbers and you divide them into two pieces, these patterns are common enough that one of these uh no matter how you divide uh the natural numbers into two classes, one of them is guaranteed to contain one of these triples.

Um and in fact you don’t even need to partition all the natural numbers uh you just need to partition uh 7,825 of them. Okay so um um I can’t this pointer does not okay um you know so um no matter how you decompose um you break up this this set into two pieces one of them contains a Pythagorean triple. Um On the other hand, if you do it with only 7,824 um you uh uh there’s there’s a way to do it uh for which neither class can contain a triple um so that part is fairly easy.

You just need to exhibit one partition that doesn’t do that and you can check that with a very simple computer program but the hard part is is the second part because uh the number of ways in which you can break up um this set into two is 2 to the 7825 uh which is a humongous number you cannot check all of these by hand. So this is basically like a vast Sudoku with 7,825 boxes um but you can show that there’s no solution to this Sudoku. Um At the time it achieved the record for the world’s longest proof um the proof generated a certificate uh which required four CPU years of um I need a megahertz. Okay uh that’s not a valid unit but okay four years of a certain CPU but okay.

But um um okay but uh yeah the proof was originally 200 terabytes which uh uh okay you know maybe the Asai is a bigger sets but to us that was a large large number um uh okay they managed to compress it down to a mere 8,668 gigabytes but at one point it was the world’s longest proof. Okay so this is this is sort of an example of traditional uh ways of using computers in um in um in mathematics. Okay but uh what is exciting is that there’s there are several new ways of using computers um which uh individually people are finding niche ways to incorporate them into doing um um into into their workflow.

Okay so of course there’s mundane ways okay so we all use email we all use computers to write out papers and search for papers and download things from the internet and so forth but I’m not interested in these mundane things um but uh the three new ways that are um beginning to become transformative but not yet at the killer app stage. So one is machine learning uh so using things like neural networks to find patterns and relationships that from large data sets that you just could not see uh from uh by from humans or possibly you could you could discern them from a more traditional statistical analysis but um the thing is most methods don’t have this training to uh to to work these data sets and these machine learning algorithms could like automate this at a scale that we just cannot we don’t have the training to do.

Um then of course there’s the uh uh the high-profile large language models chat GPT, Claude, and Gemini, and so forth uh where they they’re um well first of all they’re making other tools easier to use um you know so it’s easier to run machine learning or to code up one of these things if you start asking to do it for you um but they are beginning to solve some simple problems in mathematics and and or to assist with sort of uh secondary tasks in a mathematical activity. Um So far they are not good at directly solving hard math problems but that may not be the best way to deploy them.

Um Then these formal proof assistants uh which are very complementary to all uh these are like very very fussy proof checkers. Uh they um if you write a proof in a very specific language uh this these proof assistants will will go through every line and it will either say tick uh this is a correct proof or um you know or compilation error this is a correct proof up to line 567 and now this is a syntax error. Um Much like you would compile um computer code except that computer code generates executable um programs.

And uh formal proof assistants, they are also computer languages they can technically also do that but but they can also generate proof certificates um uh and that is uh well I mean that’s nice by itself but it’s also enabling other things like for example allowing finally mathematicians to collaborate in large groups which we’ve basically not been able to do before except with like a lot of painful uh effort. Um Individually these are becoming useful but actually what’s really promising is that by combining them together uh we uh I mean that’s where the gap is going to come up some sort of synthesis of all these. Right now these tools don’t talk to each other very well uh but they should in the future.

Okay so let’s talk about uh I think I’ll talk proof assistance first. Um So computer-assisted proofs have been around for about since the 70s um the most um the most famous early example of a computer-assisted proof was the Four Color Theorem um which um required checking a lot of cases um so I didn’t talk about uh I didn’t the slides don’t show the proof but um it involves generating a list of about 5,000 special graphs um and verifying these graphs have various properties um and at the time um some of these properties could be verified by computer and by computer I mean a 1970s punch card type computer.

Um Some of them could be uh um proven in a very mechanical way but not not proven by a full computer. In fact um I think um um Appenheim wrote down for each of these 5,000 graphs some um a few lines of no actually they got H’s daughter actually to write down um like a like a five or six line justification of a certain property for each graph and like they had to do it um 5,000 times. Um It was very error-prone um they had to revise it several times because um um it was a long time.

It took like 20 years before um a modern computer-assisted proof which was acceptable by modern standards was produced um so the proof um um there was a new proof which was a bit simpler. It uses only like 500 graphs but um more importantly the um the the computational part of the proof was was a statement that you could state what you wanted to be true and then anybody could spend an hour or two writing a computer program to verify this specific computational task and um and this is something that you can run in seconds on one computer and and verify it and and they supplied the code um but you still have to trust the code. You know maybe um the person who coded had a bug.

Um If you really want a formal proof certificate that that that there’s a rigorous guarantee that the statement can be proven all the way from the axioms of mathematics, that was only done in 2005 where they they used the proof assistant to give a complete proof um so you see it takes decades to um um traditionally to get to the point where you can actually formalize.

Um Another famous example is the Kepler Conjecture. So this is a conjecture from the 17th century that uh if you want to pack oranges or cannonballs or whatever in space um there’s a sort of obvious way to do it which is the way that you see oranges stacked in supermarkets. It’s called the cubical central hexagonal close packing um and that has um so there’s some empty space between these spheres um and the proportional space occupied by the spheres is about 74% this number here pi over 32.

Um And it was conjectured by Kepler that this is the best way to do it that there’s no other better way to pack these spheres than the sort of the obvious uh algorithm but this turned out to be remarkably difficult to prove. Um Two dimensions is not too hard but the three-dimensional version is really really hard um there was a strategy. By the 1950s there was a strategy um so whenever you have a packing it creates a way to decompose space.

Okay so so every um uh sphere here comes with what’s called a Voronoi cell so all the points that are closer to this sphere than they are to any other sphere. There’s a certain polytope uh there’s a certain polyhedron around that sphere and and so um you can every time you have a packing you get all these um polyhedra and they have various faces, they have various areas and volumes, um and there’s various relationships between.

You can relate like if you can relate the volume of of of of these touching polytopes with the volume of there’s various inequalities that that you can do and they’re also related to the density of of the whole um um system. So uh if you in theory if you could collect enough inequalities between all these different um um polyhedra uh and then you run some sort of linear program you might be able to just deduce um um um the bound on the density.

Um And so this was a promising strategy um uh in principle you only need a finite number of these things so something that a computer could potentially um do um you know as opposed to the whole packing problem where there’s an infinite number of of these boards. So people tried this, there were many false attempts uh Bru proposed um so this was finally done um in the ’90s by um by Hales and Ferguson.

Um So the thing is the strategy as stated doesn’t quite work um you have to replace these polyhedra by more complicated polyhedra which don’t have an obvious description. He kept tweaking the definition of these Voronoi cells, and also things like volume instead of taking obvious measures, he kept introducing a score, a numerical score for each and the score kept changing as the proof advanced because they tried one thing and it didn’t work and they added one more bell and whistle and it didn’t work um and finally they got it but it was um a very controversial proof.

Um So I just quote so Hales wrote that every time we encounter difficulties in solving this problem we could adjust the scoring function to scope the difficulty. This function became more complicated but each change uh we could cut months or years of my work this incessant fiddling was unpopular with my colleagues every time I presented my work in a conference I was minimizing a different function. Even worse the function was mildly incompatible with earlier papers and this required going back and patching the earlier papers.

Um So they finally did it um and yeah so they they they chose a very carefully controlled scoring function and there’s 150 variables and they did a linear program and and and they verified it. They didn’t start off with computer assistance, they were hoping to do a pen and paper proof but they were forced into computer-assisted proof. Yeah so at the time a quite a large proof 250 pages.

And 3 gigabytes of various notes and data.

They sent it to the top Journal of mathematics, the “Annals of Mathematics.” It took four years to referee, which has actually taken longer for some papers, but that was pretty long. But the referees could not—they said they were only 99% certain of the correctness. They could not reproduce the calculations.

And so, it was a—that I noticed that the very unusual thing of putting a caveat, like a disclaimer at the beginning of the paper, that the editors do not vouch for the correctness of this paper. They have since removed that from the current online version. Yeah, so it was very controversial at the time. So there was a lot of incentive, especially from Tom Hales, to actually formalize this in a computer language that, for which, there was just no, uh, no doubt that there was no just a little sign error somewhere that invalidated the whole thing.

So, he made a project. He got a NSF spec. He estimated it would take 20 years to formalize every single single step, and he was very pleased that it only took 11. Uh, and yeah, so was the formal proof was finally published in 2017 and with many, many collaborators. So, that’s all of, uh, where things stood until very recently. Like, you could formalize, but it was so painful that only for very, very high-profile projects would you, would you want to do it.

But now the technology has gotten a lot better. Still not perfect by any means, but we are beginning to, uh, like, it’s, um, you don’t have to spend, um, years to do these things. So you can do this, you can formalize in weeks. Um, so, like, very recently, um, I, uh, together with four co-authors, we proved a conjecture in comics. It’s called the polynomial Freiman–Ruzsa conjecture, although despite the name, is actually, uh, this particular conjecture introduced by computer scientist Katrin Tent Martin. Um, it’s not important what the conjecture is. Uh, it’s just some piece of math.

Uh, we proved it by traditional means. We talked to each other, we exchanged emails, we worked on blackboard, and so forth. Uh, and, um, you know, we have this 33-page paper, um, which actually just got accepted, uh, in the Annals, um, that approving this, this result. But then we decided, well, I decided that, uh, this was a good project to try to test the, uh, the most recent, um, infrastructure for formalizing these things in a, in a more modern proof assistant language Gord Le. Um, and, uh, so, so we opened a project, and it took about 3 weeks and involving 20 people, most of whom I had, I had not met previously. Um, so, um, the thing is that, yeah, as I said, you know, I mean, this group of four people, you know, that’s about the four or five is close to the maximum size of a math collaboration currently, because in math, you know, if you, um, you have to trust every single piece of, of, of an argument.

So if someone supplies a proof of, of one step, and you don’t understand it, you know, maybe there’s an error in it, then, like, the whole thing can be wrong. Um, and so you have to verify every contribution that, that everyone else contributes, and this is a significant limiting factor in, in enlarging the size of collaboration beyond four or five people. Uh, but, um, uh, you can crowdsource a proof formalization, um, in ways that you cannot in a traditional formalization. So the way a modern, um, proof formalization, um, wow, I just realized my time is almost, okay, uh, all right, I’m going to have to, to, to speed up quite. Okay, yeah.

Um, but, um, the way a modern proof formalization works is that, um, uh, you take a big, complicated, um, statement, and you break it up into lots of little pieces, lots of little lemmas, uh, each of which is fairly simple, uh, maybe a simple consequence of other pieces. Um, and so, you know, there’s this, there’s this statement on the bottom. So, so the statement that we have is called the Freiman–Ruzsa conjecture. That represents this little bubble at the bottom. At the time of when I took the screenshot, this was, um, an empty bubble, which means that it hadn’t been proven yet. Um, but there are other bubbles that we, we, we, we stated all these other facts that, so this, this, this say, it depends on, on these four. Some of these we had already proven, some are, are ready to be proven. There’s a color coding, which I won’t talk about, but the thing is that you could then crowdsource, and people could volunteer. I’m going to work on, on, on one proposition, and, and you just need to prove there’s one thing, that this lemma follows from, from the three things above it, and you don’t need to understand the whole proof.

Um, so this, this was a the comics, but most of the contributors were not comorists. Um, so, um, and in fact, many were not even professional mathematicians. You know, there were computer scientists, people from industry, uh, people who just liked solving puzzles, actually, um, you know, um, so we got a very broad class of people, uh, to work on this. Um, and like every contribution had to be formalized, had to be certified by this, um, uh, by this language, Lean. Um, and so, like, we could trust all the, all the, um, you know, so, um, it didn’t matter that I didn’t know these people, and maybe they make mistakes, but they make mistakes. It wouldn’t get accepted into the project. Um, and so we’ve been able to do these large collaborations. Okay, so I think I will skip. Um, yeah, this is just one example of what Lean code looks like. It looks vaguely like math, uh, but also like computer code. It’s some sort of weird hybrid of both. Okay, but I think I will skip the, um, uh, yeah, so, um, um, in many ways, it’s more tedious. It’s still more tedious than, um, writing things traditionally.

I would say that currently, if you want to write a formal proof, it’s roughly about 10 times longer than writing a proof in the traditional way, you know, using a LaTeX language and, and so forth. Um, and so, like, it’s not worth doing, um, and, and, um, at the current stage, like, for everybody, um, you have to still pick and choose which projects you want to formalize. So we’re still kind of focusing on relatively high-profile projects. Okay, so maybe the highest profile currently is that Kevin Buzzard is one year into a 5-year project to formalize the entire proof of Fermat’s Last Theorem. Um, and here he estimates in 5 years. He may not formalize all of it, but, uh, so Wiles’ proof was in the mid-90s. He wants to reduce at least that statement to facts that were already known in 1980. Um, and then there’s still some work to formalize those, but that seems to be the, the, the current goal. Okay, um, maybe I tell you, so that’s one way in which mathematics is being transformed, that we’re beginning to, we are now able to collaborate at large scales. Um, and so, um, I, I gave you a project, uh, example where we, um, you’re formalizing something that already been proven. Uh, later, hopefully, I’ll talk a little bit about formalizing mathematics. I mean, where the results haven’t been proven yet, and, and the, the, the formal structure is actually part of how we actually, um, uh, arrive at the proof.

Okay, so another, um, um, direction is, is to, uh, is to use these, these large language models. So this is one of my favorite stories. Um, so, um, so this is a story from knot theory. So knot theory is a, is one of the branches of, is a branch of sub-branch of topology. You know, so you, you make a loop in space. Um, you know, there are various knots you can make, and some knots are, are, are different from each other. You can some you can deform one to the other, and some are genuinely different. And so one of the basic questions in knot theory is how can you tell whether two knots are equivalent or not? Um, and, um, one of the ways we do it is that we assign these things called invariants. Okay, so there are certain numbers that you can assign to a knot, such that if you continuously deform the knot around, these numbers don’t change, and so if two knots have this different invariants, then they are not the same knot. Um, now, there are various different ways you can construct invariants. Um, there’s something called the signature of a knot, which is a certain integer. Um, and it comes from combinatorics, you count crossings and whether you cross over and cross under, and there’s a linking matrix. There’s some combinatorial recipe to generate knots. Um, and then there’s these geometric invariants, um, where you look at the space outside the knot. Um, so this is three-dimensional space with a knot removed. Uh, this is what’s called a hyperbolic space, and you can define the things called hyperbolic volume and hyperbolic cusp volume, and there are lots of real and complex numbers attached to these knots. Okay, so I’m not going to define these things, but, like, this is a partial database of a whole bunch of knots, like, um, um, and various statistics attached to them. So there are these two unrelated ways to generate invariants. There’s a combinatorial, um, recipe, and then there’s these geometric recipes, and there was no connection known between these two.

Um, so, um, uh, a bunch I forget the the authors, they’re not listed here, but, um, um, a bunch of, um, mathematicians decided to just throw machine learning at this. Okay, could you get a neural network, like if you just feed the neural network the, um, um, the geometric invariance, these like 20 real numbers, real complex numbers attached to these knots, could you predict the signature? And they found that there’s a network that they could make a neural, could do this with like 90% accuracy, or 99% accuracy, something really, um, superbly accurate. So this told you that there must be some relationship, at least invariants were related, but the relationship was given by this black box, this, this neural network, that this is very complicated, um, set of weights and functions. Um, and so you had to somehow open up this black box and understand it.

So, uh, they did a very basic analysis on this. Um, so, so once you have this black box, it’s basically a box that takes in 20 numbers and spits out one predicted number. And so, like, you can just turn these knobs, like you can change one of these, um, uh, numbers and see how much it changes the output. And they found that, um, of these 20 inputs, 17 of them did almost nothing, and but three of them were really important, that there were three of the 20, um, inputs that really made a big difference. Um, and there were three inputs. It’s called the longitudinal translation and the Re, and complex parts of the model translation. Um, these were inputs they did not expect to be important. For example, they, they expected the volume of the, of this volume to be the most important invariant, and it was actually almost insignificant. Um, so, um, with some analysis, they realized that there were three inputs that were, that were really, um, um, key, and then they could plot, they could plot, um, those inputs against the signature and so forth, and then visually they could see um relationship, and then they made a conjecture based on sort of visually fitting a curve to, to the data.

Though, all, um, once they had this prediction, they compared it against the neural network, and the neural network said, “No, this, this conjecture can’t be true, because here are some uh examples of knots where that don’t do this.” Um, but the way that the conjecture failed, uh, let, um, uh, led them to see how to fix the conjecture. So they add an extra term to make it more accurate, and then once they had the right form of the conjecture, could then actually prove it theoretically. Um, so it was really this conversation between theory and experiment and, and conjecture and, and the machine learning and the humans. Um, and so they, they, they did rigorously prove a connection between these, uh, these facts. So, um, this is another way in which, um, mathematics is changing, but it needs data. Okay, so what was, what made this work was that there was already a database of a million knots. Um, they added two million more, um, to an existing database, and then they’re able to do this. Um, right now, we still have most of mathematics do not have enough databases. We are still a data-poor science in many ways.

Okay, um, this I have some slides on large language models, but maybe there’s not much that hasn’t already been said. Um, this was sort of, you know, I mean, large, sometimes they can solve really. Yeah, so for example, now they can solve, like, Olympiad-level, um, problems, high school math competitions that most high school students can’t solve, and sometimes they can get perfect answers. Um, and sometimes they can’t do basic arithmetic. Um, so, you know, you, you can give it an arithmetic summer, and it would just, uh, happily spit out the wrong answer. And then you point out that it was wrong and say, “I’m sorry, I had a typo,” and so forth. It’s it’s patent matching. It’s weird that somehow this the set of tasks that that large language models are good at is almost orthogonal to the set of tasks that humans are good at. Um, so things that we find hard they can find easy, and things that we find easy they can find hard. Uh, it’s a very weird tool.

Um, but, um, so you by themselves, uh, I mean they are already useful by themselves for, for various things, like if you want to quickly learn a subject that’s not quite in your field, um, it’s, it’s like a really good version, interactive version of Wikipedia. So you can do some sort of literature search for basic things. Uh, you can, you can Sly WR code, you can, you can, you can format, you can write. You know, if you have a tricky LaTeX image you want to make, you can do that. There’s lots of secondary tasks, which is very useful. Um, you can also, um, ask it for suggestions of how to solve a problem, and you know, give you 10 suggestions, of which seven are rubbish, um, but, and two are kind of things you already thought of, but, you know, one might be an interesting idea. Um, um, people beginning to combine all these different ways of using computers together. Um, so, uh, this is very promising approach by various groups to, to construct. So one of the the biggest problems in fluid equations, in mathematical fluid equations, is to construct solutions, initial conditions to fluids that actually develop singularities in finite time. There’s a particular equation called Navier–Stokes that we really want to construct solutions to, but actually there’s a lot of other fluid equations that are simpler, uh, that we would, we would love to, um, to rigorously show blow up. Um, you can, um, so there seems to be this two-step process.

Uh, so the way to do this, we think now is to first use machine learning and similar tools to first construct approximate solutions that almost blow up, um, but blow up almost blow up, but, but to really high accuracy, that they’re within like, like 10 decimal points in position to something that actually does blow up in some sense. Um, and then combine that with some really rigorously verified, um, P analysis that shows that that any approximate blob solution has to be close to an exact solution. So you can use whatever non-rigorous error-prone AI you wish to find the candidate. Um, but then you, you couple it with some rigorous verification at the end. Um, and this has worked already for, for, for simple fluid equations, and we’re so slowly climbing the ladder of more, more sophisticated fluid equations. I so I think in 10 years we, we might actually get the Navier–Stokes. It’ll be a big program, but it’s, uh, we’re very close to here.

Um, okay, so um, as you saw, large language models are very bad at arithmetic, and it just, um, they, they, they make so many mistakes. Um, um, so, like, using them to directly calculate in math is very unreliable. Weirdly, it works better if it’s a very high-level math with not much numerical calculation, but the more nums there are, the more error-prone it is. Um, so what we’re slowly beginning to realize is that the way to proceed actually is to get the large language model not to directly do the math, but to generate code in a more reliable language like Python, what a traditional computing language, and then run that, um, to solve your problem. Um, and this seems to be a, a, a much more promising paradigm.

Uh, for example, there, there are many problems where the task is to construct some high-dimensional counterexample, and, and there’s some really high-dimensional space of possible candidates, and doing standard optimization or, or machine learning doesn’t work. Um, but instead, writing a much smaller computer program to generate a candidate, which is high-dimensional, running that and doing some iteration on, on that code, uh, it has already begun to improve some, um, mathematical props. For example, we the fastest matrix modification algorithm for large matrices now is due to this, uh, this AI fun fund search, which was created by Google Deep Mind. Um, and, um, there are these ongoing, um, artificial, um, intelligence math Olympiad challenges. I’m actually on the scientific advisory board of, of this, of this one. Um, so we have, um, the dream is to get an AI to actually achieve, you know, gold medal performance at, uh, one of these, uh, these Olympiads. Um, hopefully using something which is open source, and something that that can actually be be replicated by other people, and not just requiring some, you know, $10 million of compute, uh, and, and lots of lots of fine-tuning. Um, but you know, we now there open source models now that can that can, uh, so we we haven’t yet achieved performance at at really the limited level.

Um, one of the problems actually is grading, uh, the output of, of these, uh, um, of these, uh, these models. Uh, we don’t have enough human graders to, to evaluate the output here, so we have to content ourselves working with problems that have numerical solutions, like a problem whose an is a three-digit number, because that we can check automatically, but, uh, they did surprisingly well. Um, so, um, you know, so on on a medium-tier Olympiad level, you know, they can now achieve, you know, 50, 60% performance. Um, again, the secret was to not solve the problem directly, but to generate code that would then then solve the problem. Yeah, okay, it says there’s a similar, um, there’s another algorithm that solves geometry questions, but maybe I I will skip, uh, skip that.

Um, yeah, so, um, there’s been success. Uh, so instead of outputting in Python, um, um, people are beginning to figure out how to get AIs to, to, to output in these, um, proof verification languages like Lean. So, uh, you can, um, um, so this is recent success in solving Olympiad, like genuinely Olympiad-level questions, by converting the question into a Lean statement, and then, um, using a version of AlphaZero’s, um, um, um, program, which is what DeepMind used to, like, you know, um, um, so go for instance. Um, and, you know, viewing these math problems as this massive game where there’s certain moves you can do, and you just need to get from A to B. Um, and then they can prove quite difficultly these Olympic-level problems after, like, massive, massive compute. Um, although the proofs are really, really weird. Uh, I mean, the, they’re, um, the proofs are completely inefficient.

Okay, so for example, this they solved this IMO problem, and the first step is to induct on on the number 10, uh, which makes no sense. Um, it’s actually it’s a step that you could just delete, and it would still actually be a valid proof. Um, but, uh, yeah, and then actually there’s some human commentary. You know, they, yeah, they Pro spend 169 proving a lemma, and then it gets used without being used. Um, like, it’s a completely alien type of proof. This is not what a human would, would, would write, but it can solve these questions, and we don’t really understand why, partly because we don’t have access to the actual source code or anything. this is all just what Deep Mind reports but anyway right um yeah but you can solve these mive problems um one thing I’ve been involved in is um trying to use these tools to actually do new math rather than just solve existing questions um so I had created this test project where um so one thing which these tools can maybe do is do mass exploration of um um many many math problems at once you know so math editions right now we can only work on one problem at a time spend months on it and then we go to the next problem um but potentially now you can try to attack millions of problems not the most difficult problems but millions of sort of medium difficulty problems um using computers and that is a different style of more experimental mathematics which I think will become more and more um prevalent.

So here is one example just as proof of concept that we started doing um so um you may be familiar with various laws of algebra like the commutative law x * y = y * x and the associative law x * y * z = x * (y * z) um and so some axioms obey these laws and some don’t um and so we just proceed to generate 4,000 laws of algebra okay so like the commutative law is this one here x * y = y * x but maybe you know x * a = x * y and so forth um so we just generated all these um um equations and we give them names um and some of them imply other ones um so for instance if x is equal to x * y then it turns out that x * y * z is always equal to x * w * u there’s some algebraic deduction that you can use some of these laws to prove other ones but some laws are not connected um like uh um let’s see for example uh this law x * x = y does not imply x * y = y * x because you can construct operations that obey this one but not the other one and so we have these 4,000 or so um laws and there’s like 20 million pairs between these laws and the question is which laws imply which other ones uh and so the project is basically just to explore this entire graph um so any given edge is fairly easy you know like someone who has a graduate level exposure to algebra or something could maybe just take one of these pairs and by hand you know half an hour figure out which one’s true which one’s false but you have 22 million of these things you have to do it in an automated way um and so uh we managed to do this okay.

So in about two or three months we completely determined this graph some were easy some were hard um we did know in advance which ones were easy which ones were hard um and so uh this was crowdsourced there’s going to be a paper with about 5050 co-authors which may be close to a record actually in mathematics um where um uh you know so we have these 20 million tasks and so people can work on one or ten or a million of these they can deploy their favorite AI tool and so scan all of these uh implications and see which ones uh they can do and can’t and um then it all gets uploaded to a central database and everything has be verified in Lean so uh you can’t corrupt I mean if there was one of them that was wrong it sort of ruins the whole thing but everything has to be certified.

Um and most of our collaboration we didn’t know each other um there were people um most of them are not even mathematicians but um different people contribute different things um some people might uh one person might figure out one relationship between um um two of these equations and then someone else would generalize it and then write a computer tool in say Rust or something to search all the other um implications and then um and then some a third person would take that computer program and make it produce output in Lean when it gets uploaded and like no single person could do all this we always have these also these beautiful um visualizations that I have no idea how to write um but it all got uh put together.

Um we used um actually surprisingly we didn’t actually use modern AI very much so we didn’t use all these fancy tools partly because we didn’t have the budget for the computer or anything um we used much more classical a thing called automated theorem proving that I mentioned very briefly at the beginning um the main application currently of these large language models was to build the graphic interfaces um around the project but not um they were not super successful at the actual tasks themselves but I think that’s just a matter of time um but it’s a new type of doing mathematics which you couldn’t do without all these tools before I mean you could not prove 22 million of these implications with just humans.

So um yeah so where do I see all these tools in the near future um so you know I mean the the N thing is that you just you know you should you could just ask your favorite problem into chat GPT and it will give you the answer this does not work um but there’s lots of side tasks which are very very useful so um semantic search um it’s already beginning to get good at you know you write a very R question you know I need a tool that that controls this random variable given this assumption but I don’t know what to search for and if it’s something fairly standard it will give you exactly what it is to tell you exactly what it is or it can give you it may not give you the same right answer but it will give you some keywords that you can search for um formalization is still a pain as I said it takes 10 times longer to formalize the proof than to write it by hand but AI tools are beginning to chip this down from 10 to 9 to 8 there are lots of little uh AI assistants that can speed up either the top breaking up a big proof into lots of little pieces or filling in each of the little pieces um and um I think once we have more databases we can do a lot more with math than we can currently so a big bottleneck is just we have to actually get people to create high-quality math databases and we don’t have enough um yeah I think we need to change or at least broaden our workflows of the ways we do mathematics.

So the traditional project is you pick one hard problem or maybe two or three related problems and you work towards that goal and every step has to be correct you have to have 100% um validity with your steps otherwise you don’t have a proof at the end um but all these many of these tools particularly machine learning and AI tools have failure rates sometimes they give you rubbish um so either you need to filter everything through a verifier like Lean um so that’s one possibility um but uh more generally you need to somehow have a very modular project where it’s okay to have a failure rate you know like instead of solving one problem you solve 22 million problems and if AI tool only solves 10% of those that’s already 2 million of the tasks done and you add that to the pile and then you run another AI um so uh we don’t do that currently um you know but uh I mean in the other sciences and in the real world you know you deal with failure rates you know um but so uh we need somehow to embrace uh ways to do mathematics how to use AI tools to still produce um rigorous results at the end.

And but it seems like it is possible um and then the final thing I think which uh will potentially be transformed is that we can use these tools to find new ways to teach mathematics um you know so I mean already AI are kind of being used okay so so clearly they’re being used to do homework assignments right but uh but um but AI as a service you know like um you know like if it’s done in a publicly created way where like you know the AI teaching system does not give you the whole answer but it is it um you know like maybe it would do all the calculations for you but you still have to supply the high-level direction for it or conversely it will give you the high-level prompts and you have to do the step-by-step details like it could really enhance pedagogy I think textbooks can be much more interactive um you know I mean you read a math textbook if there’s a step you don’t understand you have to ask the instructor how to explain or you need to spend hours working up by yourself what was really meant especially if there’s a typo um but um you know uh future textbooks should come with AI assistance you can just ask you know can you explain lemma 3.5 uh and like how does you know you can ask like really specialized questions and you can open up with proofs and and maybe you could get past to a lean version of this proof you can every sentence that is uh you can expand out into a much longer explanation expand and contract as needed there’s already some um prototype software for doing this semi-automatically um so that’s also I think very exciting uh so I think those are all the uh the different things that I I see so thank you very much.

All right thanks very much for the talk um so he’s going to take some questions and there’s a button close to your seat that you can use to ask questions uh if it doesn’t work I can come over um to give you the mic all right so questions yeah please go ahead uh first of all Professor to thank you thank you for the very wonderful and engaging exciting talk.

Yes.

Okay hello uh thank you Professor to for the very wonderful engaging and thought-provoking talk. My question was in regards to your mention of some areas of math lacking the data necessary for machine learning algorithms to tackle them. It seems like a lot of the problems you presented were like of very combinatorial nature and like very amenable and approachable by computers or machine learning algorithms to solve so I was curious as to like whether it’d be possible to use um computers or machine learning algorithms to attack problems in other areas of math that might not have as much data or I guess have as much.

Right um yeah yeah that is a challenge I mean um on the formal side actually um algebraic types of mathematics actually been easier to formalize than sort of analysis topics um yeah but uh but for using uh and and weirdly as I said large language models actually seem to be able to do high-level algebraic reasoning a bit better than high-level um um combinatorial or analytic reasoning um um somehow in algebra in particular there’s often sort of a canonical way to do things or canonical way to define a certain object and how to use it.

Um yeah I well I’m not an algebraist yeah so I mean um yeah it’s most of the data sets that I can’t even imagine what data set we’d need okay so um some of it’s just collecting literature in one place which which is already sort of useful there’s so many fields where the known results in this area just completely dispersed um and hopefully AI can help doing literature reviews, that’s another possible use case um another thing is that um like mostly we only uh the data that’s out there is mostly only the results the proofs that actually worked um so um actually it’s like um one thing that these um Deep Mind type explorations have found is that actually um it’s very important to have in the training data not just complete successful proofs of various problems but also proofs that got halfway and then stopped um and then maybe they in the database they also have another proof of something else but they also um got that to something like that halfway point but they were able to keep going um and then somehow they’re able to combine um these two things one that proved the right thing but but but stopped halfway and then one to prove something different but kept going and they could combine them um so you know behind every successful proof there’s like months and months of trying things that didn’t work and then figuring out why they didn’t work and then changing your proof strategy and this is something that is almost completely private we don’t share this information maybe if you’re lucky your adviser might share one of these stories or something um but um you know or but uh or you are in a collaboration and you see real time one of your collaborators doing this um but it’s not something that we share um yeah know same reason why we don’t publish negative results as much in sciences um and uh maybe we need to do more of that uh just show the process um but I mean the AI in particular has sort of no access to this data and so they can either produce something that looks like a correct proof or something which is rubbish but they don’t know how to fix it.

When things go wrong they’re very bad at that right now um in one of the examples you mentioned there were lots of propositions most of them were easy to formalize but a few were very hard to formalize right.

Yeah um is there any common structural pattern for those hard to formalize propositions um that’s a good question um yeah what we found was that there are these sort of techniques so there’s these 20 million implications and we had certain techniques you so you could try using an automated improver you can play with small examples like just play with small modification tables there are certain constructions based on linear algebra um and each one sort of um covered some percentage of these um and so there’s this sort of residual set of survivors that just sort of were somehow immune to all the previous techniques um and they were a very eclectic set um we don’t um the only thing in common is that none of the easy methods worked on them um so they did kind of look like uh yeah so there’s sort of this math from Mars aspect you know so as I said you know there are some famous laws of algebra like the associative law and commutative law that are very well studied um and actually all the implications involving those were very easily dealt with um and then there’s just these few random survivors it’s it’s like you’d be just passing through several filterings of different methods um and there wasn’t a common pattern to them.

Maybe you can collect all of them and train a model just for the hard propositions?

Yeah yeah we were hoping I mean in some sense uh it was a pity that the project succeeded in like two or three months I was hoping that we would get stuck with some core of like 1,000 hard ones and and there would be different people trying increasingly clever things um but we ended up with like a core of about 100 um uh problems that would and so um and some of them implied other ones so like if you solved one you could get 10 for free um and so like the last 100 was just done by hand um yeah we um we didn’t use fancy AI or anything at the end thank you.

Very interesting lecture um I’m curious if you’ve investigated like the latent representation of proofs or parts of proofs uh because one of the breakthroughs that happened with language models was to have a representation of first words and then combinations of words and then ideas in a high-dimensional space and with that the models are able to recognize patterns and form things and I’m thinking there might be an analogy of this in proofs and also objects of things like knots and so on.

Uh it’s certainly possible um we have found for example there are databases of graphs um and and there’s different ways you can describe a graph you can describe edges you can describe the adjacency matrix which is a big string of ones and zeros or you can list the edges um and there’s some other ways to construct graphs and we have definitely found that the representation matters um that uh that some machine learning algorithms when trained on graphs with the wrong representation are much less efficient um and we don’t really understand what is um like right now basically this is a trial and error process every time you say I’m going to use a new network to attack my problem you have to make all these decisions at the beginning how you represent your data what your parameters are and what you count what’s your loss function and um it’s not it’s an art it’s not a science yet.

Um it’s possible that the same thing was true for proofs yeah I think um for now you know these proofs are just being tokenized as text um and somehow large language models they just have sort of a general understanding of text uh as written by humans but this seems good enough um I mean yeah but but if you got away from the text and got closer to the intrinsic uh mathematical operations and logic and reasoning that parts of the proof then it could build predictions and form patterns and yeah theoretically um I think the thing is it’s very hard to compete with large language models because they have trained at great expense on massive amounts of text and there’s no comparable database for these diagrammatic representations of proofs that um maybe you could start with all the Lean proofs and train from scratch small 0.1% of the size of sort of the internet you know which is sort of what LMS get to train on um so uh yeah but this will happen I think.

Yeah yeah thank you.

Last question.

Uh so thank you thank you very much for your talk so the examples you gave were from specific areas of mathematics not number theory or group theory where already the problem was pretty well identified yes and then they had some proof strategy which they explored further.

Now in the uh 1950s and 1960s as you know um what people discovered is that uh people in different areas of mathematics were doing similar things in different languages and this led to the development of category theory and the notion of functor and so on in order to transfer uh basically structures and results from one area to another and this uh discussion of latent structures reminds me of that is there any attempt to use the language and formalism of category theory to for example build high-level proof assistance which does not depend on a specific uh set of objects like specific universe of objects?

Yes, there are proposals okay yeah so so this thing called homotopy type theory which is in principle designed to do that uh it writes proofs in a way that you can really perturb the forms continuously of the objects that you’re working with and the proof somehow survives um uh none of the major proof assistant languages currently are based on that framework uh in principle it could be done um part of it is that most of mathematics is not written in these frameworks yet is written in much more orthodox you know set theory type things and so um you know in order to actually have enough of a critical mass of people working on these things you need to work in a language that most mathematicians are familiar with um so it’s sort of two steps to like it’s ideally we could write proofs in a in a much in this abstract language but that’s not what we do as working mathematicians every day.

So the amount of effort that people would put into formalizing that would be very very small right now maybe once formalization becomes a lot easier with AI assistance we could we could start doing this that way and maybe maybe that would actually be a much more efficient way you could easily take a proof in topology and change it to a proof in…Geometry or something you know without having to redo everything, yeah

Josherich's Blog

Terence Tao - Machine-Assisted Proofs (February 19, 2025)