Thoughts on AI Risk
TLDR:
- I take AI risk seriously, and I’m glad people are working on it.
- I’m not an AI-risk maximalist: I put it on par with bio-risk and nano-risk, all of which we should take seriously.
- I’m not sure direct work on AI risk is obviously more valuable for reducing AI risk than indirect work at present, especially work on neurotech.
People ask me about AI risk a lot, maybe because they think (former?) machine learning researchers should have opinions on it. Maybe we shouldn’t. But regardless, here are my current beliefs on the subject. If they’re wrong, please fix them by emailing me.
AI risk is a big problem
The core claims justifying concern about AI are:
- It is possible to build AI that is more intelligent and capable than humans in every domain (a.k.a. superintelligent AI).
- We can probably build it soon, though even if it’s far off it’s still important to think about.
- There’s no reason superintelligent AI has to behave in any way we expect it to, including being nice toward humans. (The orthogonality thesis.)
- There’s no reason superintelligent AI will be controllable or stoppable by humanity.
I agree with all the above and am grateful people are working on the problem.
AI risk is not the sole biggest problem
I’m not convinced the risks from AI are greater than those from biotech or nanotech.
TKTK move this list to after main point
TKTK add considerations about bio risk. There’s basically no x-risk from non-engineered pandemics, and the amount of engineering required to make an x-risky pandemic is significant. But it’s absolutely possible. Factors: Virulence Transmissability Incubation time Detectability Adaptability
First, to briefly address some common arguments I hear:
- AI isn’t more risky based on base rates. Modern developments in biotech and nanotech are as unprecedented in earth’s history as developments in AI.
- AI isn’t more risky based on the pace of development. Developments in AI are not obviously occurring faster, inasmuch as that can be measured, than in biotech or nanotech, especially considering the last 100 years vs. the last 10. They are, I admit, better publicized.
- There are no obvious arguments from biology or physics that biotech and nanotech are less risky than AI, like there are for risks of black holes from particle accelerators or the risk of igniting the atmosphere with nuclear weapons.
- A superintelligent AI could develop advanced biotech or nanotech on its own, so one could argue that all biotech and nanotech risk counts toward AI risk too. This is double counting: we’re only worried about the first (and therefore also last) existential disaster.
The most convincing argument for AI-risk maximalism is that AI could lead to worse outcomes than biotech or nanotech. While AI, biotech, and nanotech all have the potential to cause “high entropy” disasters like reducing earth to a lifeless rock or puddle of grey goo, only AI seems like a route to low-entropy outcomes like S-risk scenarios or paperclip maximizers. The key question is how likely such scenarios are and how bad they would be.
I’m quite uncertain about the likelihood of low-entropy outcomes, and willing to accept they could be pretty likely. It’s their badness that I’m not convinced of, which I suspect is my key disagreement with most AI-risk folks. Specifically, I suspect an AI system capable of usurping humanity is also likely a greater moral patient than humanity.
Consider why you think you’re more important than an ant, which I’m pretty sure you do. Now consider what attribute an AI system would have to possess to make it okay for the AI to treat humanity the way humanity treats ants. Your estimate of AI risk is proportional to how likely you think we are to create an AI that doesn’t possess more of this attribute. Let’s just call it moral patienthood.
Maybe you have a theory of moral patienthood that suggests the AI we create will almost certainly not be a greater moral patient than us — a “misaligned by default” theory. I don’t. In lieu of better understanding of cognition, I’d rather maintain as open a theory of moral patienthood as possible. Not doing so seems anthropocentric and a form of value lock-in. And in such an open theory, an AI exhibiting the kinds of intelligent behaviors required to usurp humanity (self-improvement, self-preservation, a world model, etc.) would seem to possess at least equal if not greater moral patienthood than humanity. And yes, that includes a paperclip maximizer, whose goals are indeed as incomprehensible to me as my goals are to an ant.
This is not a might-makes-right, or might-makes-moral-patienthood, argument by any means. One can imagine a narrow, non-moral-patient AI that hacks every nuclear launch system and blows up the planet. Or one can imagine an AI system exploding into supercompetence in a few seconds and accidentally destroying the world, itself included, by not knowing its own strength. But this is a much smaller part of mind-space to worry about than warrants AI-risk maximalism.
What to do about it?
Direct AI-risk work is great: here’s a nice breakdown of current research. Again, I’m thrilled people are doing it. But despite the name on the tin, I’m not sure it’s more useful for reducing AI-risk than indirect work at present.
Using narrow AI to ensure we’re building safe general AI is a promising strategy for AI safety, but the bottlenecks to doing it often aren’t in the purview of direct AI risk research. For example, mathematical theories are a key tool in our arsenal for reasoning about AI risk (not to mention all of science). Existing ML techniques could be used to help produce mathematical proofs relevant to AI safety, but are bottlenecked by the state of tooling for formal mathematics. Tools like Lean could be developed within a few years to a point where narrow AI could use them to prove mathematical theorems beyond what humans can. But nobody at present considers such tool-building work to be direct work on AI risk. Similar bottlenecks exist in the infosec and program verification fields. Advances in either could secure against many kinds of AI risk.
More importantly — though Neuralink has muddied the waters a bit with its “merge with AI” thesis —
neurotech, and especially neuromodulation,
is hugely underrated by the AI risk community.
The neuroscience of human brain function may
or may not be useful for developing more powerful AI algorithms.
But neurotech is the key enabler of better theories and metrics of moral
patienthood, consciousness, and wellbeing, which are necessary both to estimate AI risk and ameliorate it.
Many in the AI field seem to ignore the fact that neuroscientific insights — say a predictive theory of valence, fine-grained experiments determining the boundaries of consciousness, or an existence proof of substrate independence by offloading a cognitive task via a brain-computer interface — could radically change our beliefs about AI risk. Or perhaps they assume such insights are centuries away, only obtainable through a slow and unpredictable process of navel-gazing and neurological stamp-collecting. True, there is a lot of that in neuroscience. But in fact neuroscientific insights are obtained through engineering, which can be predictably accelerated with greater effort. (Though, admittedly, not by effort you can do solely on your laptop.)
Have feedback? Find a mistake? Please let me know!