Request for Collaborators: Better Tools for Math Research

August 30, 2019, updated April 1, 2020

I’d like to build (maybe with you!) a user-friendly web platform for collaborating on mathematical proofs and organizing mathematical knowledge.

Why bother?

Progress in mathematics is upstream of nearly all progress in science and technology. But unfortunately, progress in math is getting harder to make. Humanity’s mathematical knowledge is getting larger, more complex, and more Balkanized¹ by the year.

Better tools can help. Calculators, Wolfram Alpha and the like, Wikipedia, and even humble blogs play a huge role in modern mathematical research. But they don’t directly aid in the core task of math research: assembling ideas into a logical sequence that proves something interesting. Scratch paper, whiteboards, and word processors remain the only aids mathematicians have for this task.

In other words: mathematics today is like programming before IDEs and GitHub. It’s time we build equivalent tools for math research.

What exactly am I suggesting we build?

A Proof Development Environment (PDE) web app
A knowledge base linking together proofs written and published using the PDE
Tight integration between prose proofs, formal proofs, and automated theorem provers

Proof Development Environment

A mathematical proof is a sequence of statements that leads logically from some premises to some conclusion. Each statement in a proof is justified by some combination of previous statements in the sequence.

Usually they’re written in prose and algebra, but here’s a visual example:

Proof Example

In this diagram, boxes are statements, and the sequence (or partially ordered set if you’re being fancy) of statements that makes up the proof is shown by the arrows: each statement logically follows from the statements pointing to it. Dotted-line boxes are statements that are themselves the result of proofs; the proof of the statement is inside the dotted-line box.

Nobody publishes proofs in a visual format like the above diagram. But behind every proof is a diagram like this, even if it only exists in the readers’ and writers’ heads.

Of course, the hard part of mathematics isn’t writing up a finished proof for publication. The hard part is coming up with the proof itself. Some people who develop proofs, like students, know exactly what premises they’re starting from and what conclusion they need to reach. But working mathematicians need to alternate between trying to prove things and trying to find interesting things they’re able to prove.

The process might proceed like this:

Proof Development Process Example

So, concretely: the proof development environment I’d like to build is an online editor for making directed acyclic graphs with expandable nodes, editable edges, and other nice features like

LaTeX support for the text in the nodes
Collaborative real-time editing
Version history
Contribution tracking
Tags and comments
Ability to save and share views (subsets of statements arranged in a certain way)
Easily search, cite, and link to other proofs
Ability to write statements in formal notation and use built-in proof checkers and proof assistants on statements so written

Proofscape is an underappreciated project that’s very close to what I’d like in a PDE. It’s built for presenting and explaining finished mathematical proofs, so some features would have to be added to make it useful for the proof development process. But it’s excellent. Rationale is a tool for argument mapping that’s similar in spirit and has a nice UI.

Mathematical Knowledge Base

“But it is not complicated. There’s just a lot of it!” - Richard Fenyman

If you’re reading some source code and encounter a function whose behavior you don’t understand, you can easily hunt down the code defining that function and read it. And if that code is itself confusing, you can hunt down the code for the functions used therein, etc., etc., until you understand what’s happening.

Mathematics, in principle, works the same way: you can justify every deduction made in the course of a (correct) mathematical proof, and you can justify every justification, etc., etc., all the way down until you hit axiomatic bedrock.

Unfortunately, unlike source code, most mathematical knowledge is not written in a complete, internally consistent form. Mathematical knowledge is scattered across textbooks, articles, websites like Wikipedia, Wolfram MathWorld, ProofWiki, and Math Overflow, and of course in mathematicians’ brains. This makes makes math hard to learn and understand, slows down mathematical progress, and in the worst case leads to mistakes.

A centralized knowledge base for mathematics would have many benefits: it would simplify search and learning, facilitate discovery, reduce duplicate effort, and be more amenable to computer augmentation. So it’s not surprising that people have tried to make one before. Notable attempts include the Lean mathematical libary (currently the most active), the Mizar Mathematical Library, and the Isabelle Archive of Formal Proofs.

I’m a huge fan of these projects and projects like them. But they haven’t been adopted by most mathematicians because their benefits do not outweigh the switching costs. They require a mathematician to learn a new formal language and replace their existing workflow with a proof-checker-based workflow, and the only immediate, obvious reward the mathematician gets is a guarantee of correctness for a proof they were pretty sure was correct already.

To get mathematicians to centralize their knowledge, we need to get them to adopt a tool that makes their current work easier, and build the knowledge base around that tool. To facilitate this, the online system that hosts the Proof Development Environment could also act as a publishing platform. Similar to GitHub’s public/private repo model, users can work on their proofs in private, but they can also make them public with one click. The network of published proofs, linked by citations, becomes the knowledge base.

How to solve the bootstrapping problem for the knowledge base? Hopefully we can convince a few true believers to produce an initial base of theorems, perhaps poached from spiritually similar projects like the Polymath Projects. It’s also important to make it easier for authors to cite proofs from within the knowledge base than from external sources. A well-designed citation feature in the PDE could allow inline search by name of proofs in knowledge base. And it’s important to make it easy for new users to contribute. In a PDE proof it’s easy to visually flag any theorems that are missing a citation, like Wikipedia’s red links, making it clear to users new and old where gaps are that need to be filled.

The most difficult issue in curating a mathematical knowledge base is handling multiple statements of the same theorem or slight variations thereof. Nobody wants a knowledge base clogged with restatements of the same ideas, but at the same time it’s useful to include multiple proofs of the same claim, or simpler versions of general theorems. This issue may solve itself by winner-take-all dynamics among citations, but other approaches may be necessary.

Attribution could be another issue. For proofs created entirely in the PDE, attribution should be as easy as looking at the edit history. But for proofs originally developed off-platform, one has to credit the original author of the proof and the person who added it.

Integration with Formal Mathematics tools

The examples I’ve given have all dealt with human-readable proofs (written in prose and LaTeX algebra) because enhancing mathematicians’ existing workflows is key to adoption. But in the longer term, I’m aligned with the QED Manifesto: we should get mathematics into formal notation and let computers loose on it.

Formalized mathematics is math written in a programming language rather than prose. A formal proof carries the exact same content as a prose proof, but it’s machine readable. This means it can be checked for correctness with proof verification software and, more importantly, someday expand upon itself autonomously via machine-intelligence-powered automated theorem provers. (Expanding the amount of training data for such a system is a major motivation for this project.) But, as mentioned above, almost no mathematician does their work in formalized notation. The costs to switching workflows outweighs the benefits.

Proofs written in the Proof Development Environment are a step closer to being formalized than standard prose proofs. The sequence of deductions denoted by the arrows in a PDE proof are an outline of a formal proof, mapping closely to ideas from Leslie Lamport and Freek Wiedijk. To encourage mathematicians to go all the way and formalize their proof, I’d like the PDE to be integrated with formal math languages like Lean, which has a burgeoning library of formally-proven mathematics of its own that could serve as a basis for the Knowledge Graph described above. Users can then choose to write statements in the PDE in these languages, and the PDE will run these languages’ verifiers automatically on the series of statements, visually displaying the state of correctness of the proof. Statements in a PDE proof will contain layers for multiple languages, so the same statement can be written in prose or in a proof language. In the longer term other features of formal math systems, like proof assistants, could be incorporated, effectively making the PDE a GUI for formalized mathematics.

Not all mathematics can or should be written in formal notation. Proof sketches and intuitions written in prose are valuable, and the PDE would support them. But formalized ideas are clearer and stronger, and they open up possibilities for intelligence augmentation for mathematicians.