Thoughts on AI Risk


People ask me about AI risk a lot, maybe because they think (former?) machine learning researchers should have opinions on it. Maybe we shouldn’t. But regardless, here are my current beliefs on the subject. If they’re wrong, please fix them by emailing me.

AI risk is a big problem

The core claims justifying concern about AI are:

I agree with all the above and am grateful people are working on the problem.

AI risk is not the sole biggest problem

I’m not convinced the risks from AI are greater than those from biotech or nanotech.

TKTK move this list to after main point

TKTK add considerations about bio risk. There’s basically no x-risk from non-engineered pandemics, and the amount of engineering required to make an x-risky pandemic is significant. But it’s absolutely possible. Factors: Virulence Transmissability Incubation time Detectability Adaptability

First, to briefly address some common arguments I hear:

The most convincing argument for AI-risk maximalism is that AI could lead to worse outcomes than biotech or nanotech. While AI, biotech, and nanotech all have the potential to cause “high entropy” disasters like reducing earth to a lifeless rock or puddle of grey goo, only AI seems like a route to low-entropy outcomes like S-risk scenarios or paperclip maximizers. The key question is how likely such scenarios are and how bad they would be.

I’m quite uncertain about the likelihood of low-entropy outcomes, and willing to accept they could be pretty likely. It’s their badness that I’m not convinced of, which I suspect is my key disagreement with most AI-risk folks. Specifically, I suspect an AI system capable of usurping humanity is also likely a greater moral patient than humanity.

Consider why you think you’re more important than an ant, which I’m pretty sure you do. Now consider what attribute an AI system would have to possess to make it okay for the AI to treat humanity the way humanity treats ants. Your estimate of AI risk is proportional to how likely you think we are to create an AI that doesn’t possess more of this attribute. Let’s just call it moral patienthood.

Maybe you have a theory of moral patienthood that suggests the AI we create will almost certainly not be a greater moral patient than us — a “misaligned by default” theory. I don’t. In lieu of better understanding of cognition, I’d rather maintain as open a theory of moral patienthood as possible. Not doing so seems anthropocentric and a form of value lock-in. And in such an open theory, an AI exhibiting the kinds of intelligent behaviors required to usurp humanity (self-improvement, self-preservation, a world model, etc.) would seem to possess at least equal if not greater moral patienthood than humanity. And yes, that includes a paperclip maximizer, whose goals are indeed as incomprehensible to me as my goals are to an ant.

This is not a might-makes-right, or might-makes-moral-patienthood, argument by any means. One can imagine a narrow, non-moral-patient AI that hacks every nuclear launch system and blows up the planet. Or one can imagine an AI system exploding into supercompetence in a few seconds and accidentally destroying the world, itself included, by not knowing its own strength. But this is a much smaller part of mind-space to worry about than warrants AI-risk maximalism.

What to do about it?

Direct AI-risk work is great: here’s a nice breakdown of current research. Again, I’m thrilled people are doing it. But despite the name on the tin, I’m not sure it’s more useful for reducing AI-risk than indirect work at present.

Using narrow AI to ensure we’re building safe general AI is a promising strategy for AI safety, but the bottlenecks to doing it often aren’t in the purview of direct AI risk research. For example, mathematical theories are a key tool in our arsenal for reasoning about AI risk (not to mention all of science). Existing ML techniques could be used to help produce mathematical proofs relevant to AI safety, but are bottlenecked by the state of tooling for formal mathematics. Tools like Lean could be developed within a few years to a point where narrow AI could use them to prove mathematical theorems beyond what humans can. But nobody at present considers such tool-building work to be direct work on AI risk. Similar bottlenecks exist in the infosec and program verification fields. Advances in either could secure against many kinds of AI risk.

More importantly — though Neuralink has muddied the waters a bit with its “merge with AI” thesis — neurotech, and especially neuromodulation, is hugely underrated by the AI risk community.
The neuroscience of human brain function may or may not be useful for developing more powerful AI algorithms. But neurotech is the key enabler of better theories and metrics of moral patienthood, consciousness, and wellbeing, which are necessary both to estimate AI risk and ameliorate it.

Many in the AI field seem to ignore the fact that neuroscientific insights — say a predictive theory of valence, fine-grained experiments determining the boundaries of consciousness, or an existence proof of substrate independence by offloading a cognitive task via a brain-computer interface — could radically change our beliefs about AI risk. Or perhaps they assume such insights are centuries away, only obtainable through a slow and unpredictable process of navel-gazing and neurological stamp-collecting. True, there is a lot of that in neuroscience. But in fact neuroscientific insights are obtained through engineering, which can be predictably accelerated with greater effort. (Though, admittedly, not by effort you can do solely on your laptop.)

Have feedback? Find a mistake? Please let me know!