(I’m ripping off Sasha Chapin and writing a “post” every “day” for 30 days. Kindly adjust quality expectations.)
I spend a lot of time deflating people’s dreams that machine learning can help them with their problem.
Everyone knows there’s been a revolution in machine learning (ML) over the past decade. But they tend to not know that the revolution has been spiky, not smooth. ML hasn’t improved a large amount at all tasks — rather it’s improved a flabbergasting amount at a few tasks, with other capabilities sometimes coming along for the ride.
In lieu of me telling my next interlocutor why sprinkling some sweet machine learning dust on their problem won’t help, I figured I should write down exactly where I think progress in ML has happened and let them decide whether it’s relevant to their problem. (I would point them toward various journalistic analyses of the field, except I haven’t found one that isn’t worryingly credulous.) If the problem you’re trying to solve isn’t in an area where there’s been a revolutionary advance, you should assume ML is no more relevant for solving your problem than it was in 2012.
Large-scale model training
ML has figured out how to get mostly-non-diminishing returns to scale.
If you have a massive dataset (or the equivalent of one, like a simulator) then scaling up the amount of compute and model parameters you throw at a problem will get you better performance. Gwern’s analysis in The Scaling Hypothesis says it better than I can.
Specific advances that fall under this heading include GPU-filling self-attention architectures like transformers, better optimization algorithms like Adam, and better training schemes like GANs, MLM, PPO, etc.
Software and hardware infrastructure
Related, but not identical to, large-scale model training are advances in ML infrastructure.
If you have an ML idea you want to try, there is almost certainly a well-written software library with which you can try it, and easily accessible cloud hardware to run it on. This is true for small models as well as big ones.
Pretraining and transfer learning
Perhaps more a discovery than an advance, but if you want to perform a task that’s similar to one a pretrained model can do, you can often adapt the pretrained model to your task with little additional data and compute.
It’s possible to break modern deep learning systems in just about every way you can imagine, and many you can’t.
Areas where progress has been made, but no silver bullets yet
- Robustness against adversarial attacks
- Bayesian optimization
- Incorporating expert knowledge/domain-specific models (i.e. imbuing models with inductive biases based on physics, chemistry, source code, etc.)
- I could be convinced that AlphaFold is an example of this, but it would still be the only example I know of
- ML for robotics
- Multi-agent RL
- Exploration and intrinsically motivated learning (except when derived from large-scale model training)
- Mostly advances in what you can buy from cloud providers
- Bayesian inference
- Causal inference
Areas where, despite sometimes attention and hype, not much progress has been made
- Tabular machine learning
- Few-shot and meta learning (except when derived from large-scale model training)
- Uncertainty quantification
- Time-series forecasting
If you have a strong opinion that this categorization is wrong, please let me know! This is a hard assessment to make, and I don’t have a perfect view of the field by any means.