Request for Collaborators: Better Tools for Math Research

2020-04 Note: I think an MVP could be built on top of Roam + CSS styling. If you’d like to collaborate, please get in touch: mwcvitkovic@gmail.com. I can provide funding.


I’d like to build (maybe with you!) a user-friendly web platform for collaborating on mathematical proofs and organizing mathematical knowledge.

Why bother?

Taboo but true: most progress in science and technology ultimately follows from progress in mathematics. Also taboo but true: progress in math is getting harder to make. Humanity’s mathematical knowledge is getting larger, more complex, and more balkanized by the year.

Better tools can help. Calculators, symbolic algebra systems, and Wikipedia are extremely helpful to mathematicians, but they don’t directly aid in the core task of math research: assembling ideas into a logical sequence that proves something interesting. Scratch paper, whiteboards, and word processors remain the only aids mathematicians have for this task. Mathematics today is like programming before IDEs and GitHub - we should build the equivalent tools for math research.

What exactly am I suggesting we build?

Proof Development Environment

A mathematical proof is a sequence of statements that leads logically from some premises to some conclusion. Each statement in a proof is justified by some combination of previous statements in the sequence.

Here’s a visual example:

Proof Example

In this diagram, boxes are statements, and the sequence of statements that makes up the proof is shown by the arrows: each statement logically follows from the statements pointing to it. Dotted-line boxes are statements that are themselves the result of proofs; the proof of the statement is inside the dotted-line box.

Just about zero mathematicians publish their proofs in a format like the above diagram. Proofs are published in prose. But behind every proof is a diagram like this, even if it only exists in the readers’ and writers’ heads.

Of course, the hard part of mathematics isn’t writing up a finished proof for publication. The hard part is coming up with the proof itself. Some people who develop proofs, like students, know exactly what premises they’re starting from and what conclusion they need to reach. But working mathematicians need to alternate between trying to prove things and trying to find interesting things they’re able to prove.

The process might proceed like this:

Proof Development Process Example

So, concretely: the proof development environment I’d like to build is an online editor for making directed acyclic graphs with expandable nodes, editable edges, and other nice features like

Proofscape is an underappreciated project that’s very close to what I’d like in a PDE. It’s built for presenting and explaining finished mathematical proofs, so some features would have to be added to make it useful for the proof development process. But it’s excellent. Rationale is a tool for argument mapping that’s similar in spirit and has a nice UI.

Mathematical Knowledge Base

“But it is not complicated. There’s just a lot of it!” - Richard Fenyman

If you’re reading some source code and encounter a function whose behavior you don’t understand, you can easily hunt down the code defining that function and read it. And if that code is itself confusing, you can hunt down the code for the functions used therein, etc., etc., until you understand what’s happening.

Mathematics, in principle, works the same way: you can justify every deduction made in the course of a (correct) mathematical proof, and you can justify every justification, etc., etc., all the way down until you hit axiomatic bedrock.

Unfortunately, unlike source code, most mathematical knowledge is not written in a complete, internally consistent form. Mathematical knowledge is scattered across textbooks, articles, websites like Wikipedia, Wolfram MathWorld, ProofWiki, and Math Overflow, and of course in mathematicians’ brains. This makes makes math hard to learn and understand, slows down mathematical progress, and in the worst case leads to mistakes.

A centralized knowledge base for mathematics would have many benefits: it would simplify search and learning, facilitate discovery, reduce duplicate effort, and be more amenable to computer augmentation. So it’s not surprising that people have tried to make one before. Notable attempts include the Mizar Mathematical Library and the Isabelle Archive of Formal Proofs.

I am a huge fan of these projects and projects like them. But they haven’t taken off because their benefits to a mathematician do not outweigh the switching costs. They require a mathematician to learn a new formal language and replace their existing workflow with a proof-checker-based workflow, and the only immediate, obvious reward the mathematician gets is a guarantee of correctness for a proof they were pretty sure was correct already.

To get mathematicians to centralize their knowledge, we need to get them to adopt a tool that makes their current work easier, and build the knowledge base around that tool. Thus I’d like the online system that hosts the Proof Development Environment to also act as a publishing platform. Similar to GitHub’s public/private repo model, users can work on their proofs in private, but they can also make them public with one click. The network of published proofs, linked by citations, becomes the knowledge base.

How to solve the bootstrapping problem? The citations that proof authors include in their proofs (like the one made to the Fundamental Theorem of Arithmetic in the example above) can refer to other proofs in the knowledge base or to external sources. But the citation widget in the PDE will allow inline search by name of proofs in knowledge base, like how you tag friends on Facebook. So it will be easier for a user to cite a proof from the knowledge base than from an external reference that has to be hunted down by hand. This sets up an incentive for the knowledge base to maintain its internal consistency. Of course, in the beginning most citations will be external. But these citations can be analyzed by we curators of the system to find the most widely referenced theorems that are missing from the platform, and we can offer incentives to fill in these gaps.

As for getting users to make citations in the first place: unlike Wikipedia, it is easier in a PDE proof to flag missing citations. Any statement with no incident arrows must either be a definition or a missing citation to another theorem or axiom. Viewers of public proofs will also have a way to suggest missing citations, like GitHub issues or Wikipedia red links.

The most difficult issue in curating a mathematical knowledge base is handling multiple statements of the same theorem or slight variations thereof. Nobody wants a knowledge base clogged with restatements of the same ideas, but at the same time it’s useful to include multiple proofs of the same claim, or simpler versions of general theorems. This issue may solve itself by winner-take-all dynamics among citations, but other approaches may be necessary.

Attribution could be another issue. For proofs created entirely in the PDE, attribution should be as easy as looking at the edit history. But for proofs originally developed off-platform, one has to credit the original author of the proof and the person who added it.

Integration with Formal Mathematics tools

The examples I’ve given have all dealt with human-readable proofs (written in prose and LaTeX) because enhancing mathematicians’ existing workflows is key to adoption. But in the longer term, I’m aligned with the QED Manifesto: we should get mathematics into formal notation and let computers loose on it.

Formalized mathematics is essentially math written in a programming language rather than prose. A formal proof carries the exact same content as a prose proof, but it’s machine readable. This means it can be checked for correctness with proof verification software and hopefully someday expand upon itself autonomously via machine-learning-powered automated theorem provers. (Expanding the amount of training data for such a system is a major motivation for this project.) But, as mentioned above, almost no mathematician does their work in formalized notation. The costs to switching workflows outweighs the benefits.

Proofs written in the Proof Development Environment are a step closer to being formalized than standard prose proofs. The sequence of deductions denoted by the arrows in a PDE proof are an outline of a formal proof, mapping closely to ideas from Leslie Lamport and Freek Wiedijk. To encourage mathematicians to go all the way and formalize their proof, I’d like the PDE to be integrated with formal math languages like Mizar and Coq. Users can then choose to write statements in the PDE in these languages, and the PDE will run these languages’ verifiers automatically on the series of statements, visually displaying the state of correctness of the proof. Statements in a PDE proof will contain layers for multiple languages, so the same statement can be written in prose or in a proof language. In the longer term other features of formal math systems, like proof assistants, could be incorporated, effectively making the PDE a GUI for formalized mathematics.

Once a proof is the knowledge base is formalized, it never has to be formalized again. And there are enough true believers in the benefits of formalization to formalize highly-cited theorems in the knowledge base, or upload existing formalized proof libraries into the system. By treating knowledge base citations in the PDE like imports in a programming language, we make it so every formal proof added to the knowledge base lowers the marginal effort for a mathematician to write their own proofs in formal notation and thereby receive the benefit of automatic proof correctness checks.

Not all mathematics can or should be written in formal notation. Proof sketches and intuitions written in prose have much value, and the PDE supports them. But formalized ideas are clearer and stronger ideas, and they open up possibilities for intelligence augmentation for mathematicians.

Preliminary Adoption Strategy

  1. User interviews v1: Build a mockup PDE with Post-its and a whiteboard. Give to mathematics grad students and ask them to solve a homework problem with it. Iterate.
  2. User interviews v2: Build an online PDE (maybe adapting the forthcoming New Proofscape, which can run in the browser) that’s useful for homework (i.e. lets you save and print out your proofs in an acceptable format). Partner with an undergrad linear algebra course and pay/bribe/feed students who will use it to do their HW. Iterate.
  3. Start building knowledge base: Incent students to make their proofs public. (Note: this is tantamount to putting solutions to HW problems online. This is an intentional strategy to hopefully drive adoption.) Partner with different math courses to expand the topic coverage
  4. Start adding formalization support: Tour system around the various formalized math conferences. Get feedback and offer rewards for true believers to formalize commonly used proofs, or to port existing knowledge bases like the Mizar Math Library to the system.
  5. Longer term:
    1. Become Duck Duck Go’s and Google’s primary vertical for math questions.
    2. Partner with autoformalization folks like Christian Szegedy.

Further Reading