Whence the Machine Learning Revolution?

(I’m ripping off Sasha Chapin and writing a “post” every “day” for 30 days. Kindly adjust quality expectations.)

I spend a lot of time deflating people’s dreams that machine learning can help them with their problem.

Everyone knows there’s been a revolution in machine learning (ML) over the past decade. But they tend to not know that the revolution has been spiky, not smooth. ML hasn’t improved a large amount at all tasks — rather it’s improved a flabbergasting amount at a few tasks, with other capabilities sometimes coming along for the ride.

In lieu of me telling my next interlocutor why sprinkling some sweet machine learning dust on their problem won’t help, I figured I should write down exactly where I think progress in ML has happened and let them decide whether it’s relevant to their problem. (I would point them toward various journalistic analyses of the field, except I haven’t found one that isn’t worryingly credulous.) If the problem you’re trying to solve isn’t in an area where there’s been a revolutionary advance, you should assume ML is no more relevant for solving your problem than it was in 2012.

Revolutionary Advances

Large-scale model training

ML has figured out how to get mostly-non-diminishing returns to scale.

If you have a massive dataset (or the equivalent of one, like a simulator) then scaling up the amount of compute and model parameters you throw at a problem will get you better performance. Gwern’s analysis in The Scaling Hypothesis says it better than I can.

Specific advances that fall under this heading include GPU-filling self-attention architectures like transformers, better optimization algorithms like Adam, and better training schemes like GANs, MLM, PPO, etc.

Software and hardware infrastructure

Related, but not identical to, large-scale model training are advances in ML infrastructure.

If you have an ML idea you want to try, there is almost certainly a well-written software library with which you can try it, and easily accessible cloud hardware to run it on. This is true for small models as well as big ones.

Pretraining and transfer learning

Perhaps more a discovery than an advance, but if you want to perform a task that’s similar to one a pretrained model can do, you can often adapt the pretrained model to your task with little additional data and compute.

Adversarial attacks

It’s possible to break modern deep learning systems in just about every way you can imagine, and many you can’t.

Areas where progress has been made, but no silver bullets yet

Areas where, despite sometimes attention and hype, not much progress has been made


If you have a strong opinion that this categorization is wrong, please let me know! This is a hard assessment to make, and I don’t have a perfect view of the field by any means.