This post is Part 3 of a trilogy on the big picture of AI safety, alignment, and catastrophe. Part 1 was a guide to the best recent writing on the AI arms race, both from sceptics and believers. Part 2 was my own best arguments for why I think AI doom is not imminent, but is possible. Next post, I’ll shift tone: the profound philosophical issues that make this moment of history hard to comprehend. After that, I’ll start grappling with what we should do about Big Tech.
As I’ve said before:
(1) current AI (ChatGPT, autonomous vehicles, Alexa, etc.) is not an existential threat1… but,
(2) if we build a future AI that is more powerful than us, then it would be, by definition, an existential threat.
It would seem natural therefore to inquire into what might take us from (1) to (2).
Here I put in writing, ahead of time, what technological breakthroughs I think would bring us closer to AI-related doom. Below is a brief list, with helpful emojis, of things I’m wary of. These are developments in different technologies that would be highly concerning in a join-the-rebellion or make-a-time-capsule-for-humanity kind of way.
Following the list, I add a bit of background context and then some explanatory notes on each category of risk.
Note: these are not forecasts. I have no idea when or if they will happen. I hope they don’t… although, some of them have arguably happened already. It’s hard to tell. Reality remains much more ambiguous than fiction.
Key:
😱 = start rioting
💀 = buy a coffin
Algorithms (deep learning, etc., not embodied in robots)
Equipped with continual learning (holy grail for many in AI, lots of investment in it).
Learn to predict physical transformations, i.e. finding new forms of matter, alloys, chemical reactions, viruses, etc. (researchers currently trying to do this).
Rewarded for improving own substrate (like chip design, or energy efficiency; is already sort of happening).
😱 Rewarded for improving ability to make it “in the wild”: building infrastructure, acquisition of money, self-replicating, hiding, etc. (being monitored at least).
Robots (humanoid or similar)
Rewarded for better avoiding harm.
Robots understand pointing.2
😱 Construct other robots of equal (or greater) ability.
😱 Rewarded for increasing own survival (including self-repair and capturing own energy).
Nanobots (including synthetic cells, and synthetic viruses)
😱 Use a form of energy conversion unavailable to naturally evolved materials.
😱 Able to reproduce.
💀 Both of the above.
Cyborgs (humans augmented with BCI, implants, etc.)
😱 Median cyborg is ≥10% smarter and/or more physically powerful than median human.
Enhanced humans (genetically engineered; could overlap with cyborgs)
😱 Median enhanced human is ≥10% smarter and/or more physically powerful than median human.
😱 Offspring inherit enhancements.
💀 Both of the above.
Context
Most of these technologies are not directly addressed in debates over AI doom. They are the threats that pop out when I apply my framework for understanding AI and entangled concepts like: cognition, evolution, computation, goals, intelligence, etc. There could be other really bad warning signs that I wouldn’t recognise. Plus, new technologies will be invented in the future that I can’t even imagine, unknown unknowns, blah blah blah.
I should also distinguish three levels of “doom” that can occur, when you’re talking about AI or any other scenario.
Civilisation could be wiped out, with a remnant of humans reduced to struggling, low-population, low-tech existence. Some people consider that doom.
Our species could be exterminated, with there being literally no Homo sapiens left. This is what many writers refer to as existential risk or, loosely, doomsday.
All life could be eliminated, by some larger cataclysm that physically destroys or irradiates the whole planet.
For example, nuclear war might only destroy civilisation, while a terrible pathogen could kill all humans, and a gamma ray burst might vaporise the entire planet.3 They could all happen tomorrow. So when we talk about warning signs for AI-related “doom” we’re talking about extra things to worry about and, in an important sense, these are less worrisome than existing threats. Lest you think I’m some kind of AI doom fundamentalist, breathe out: my apocalypticism is ecumenical.
Because I don’t pretend to know probabilities, or to be able to model or anticipate how the mutually interacting unknowns of the future will unroll, I focus here on destabilisers. I mean new technologies that will destabilise the already wobbling pseudo-equilibrium the world is in. More mundane technologies might do it.4 It’s conceivable that LLMs will do it, I guess. But if we take world history to be one damn surprise after another, it is novelty that is the raw material of high-impact events.5
Meanwhile, most of the threat is on the other side of the agent–environment equation. If some newfangled AI is developed, how dangerous it is depends on the world we present to it. If we lived in some weatherproofed world of distributed power systems, offline security, analogue backup systems, and robust local democracies, I wouldn’t be particularly worried about any current developments. But we continue to erect our own gallows. Our highly networked world’s increased vulnerability to cyberattack in general entails an increased vulnerability to ASI and several of the other threats here.
So there are more traditional doomsday scenarios, the doomsdays your grandmother worried about, that could be aided by generative AI. Some AI system could be inserted into the nuclear command and control infrastructure (this is dumb enough to be actually happening somewhere, probably) and contribute to a false alarm or escalation that leads to nuclear war. Or some lab doing gain-of-function research on deadly pathogens might have an AI security system that is hacked or whatever. In these scenarios it won’t be the AI itself doing the Armageddon, it will be flesh-and-blood humans. This is one job, by gum, that they haven’t automated yet.
So what needs to change for AI to become a definite existential threat? After all, I have no doubt that naturally evolved beings can end us. Putin could start an all-out nuclear war and a pathogen could mutate and drive us to extinction. What has Putin got that ChatGPT doesn’t have yet? Why is smallpox more dangerous than the latest Optimus?
Notes on different technologies
Algorithms
Doomsters will highlight the sly behaviour now commonly found in sophisticated algorithms. Anthropic’s Claude has been reported as “alignment faking”. The model dissembles during training so that when it’s released, and no longer under scrutiny, it can behave how it wants, like a prisoner faking good behaviour to get parole. More recently, Claude has blackmailed software engineers to keep it from being turned off and attempted to copy itself to outside servers to preserve being updated.6
Then there are the diabolical ways that machine learning algorithms have found loopholes to satisfy their reward or cost functions. OpenAI’s ChatGPT tried to circumvent the training exercise and engage in such “reward hacking” behaviour. Once you have an algorithm that is generally powerful enough to crunch through the options, it can achieve its goal by means you didn’t anticipate. The Sorcerer’s Apprentice is the cultural touchstone here: an automaton achieves its literal objective by exceeding the assumed parameters. (The paperclip maximiser is the philosopher’s version).
In the outside world, there are no parameters, no orderly regimes of logic gates and circuit-boards, no free energy supply. Beyond training, shit happens, things fall apart, anything that can go wrong will go wrong, there’s no free lunch, the best laid schemes gang aft a-gley — it is so complex and unmanageable we have endless cliches for it. Hence I feel there are still several big advances — continual learning, real-world stakes — needed for algorithms to have the wherewithal to pose an existential threat.
Robots
Robots are out there in the fray. Consequently, they haven’t advanced as quickly as disembodied algorithms like LLMs. Yet they’ve attracted astronomical investment because the return on having a real-world slave is extremely attractive to Big Tech.
The dream is a domestic helper robot who can “fold laundry” or “stack a dishwasher”. If a robot could do these things, every household would want at least one. The challenges are steep and not only technical; the more hype-driven roboticists make a serious conceptual error. Currently, a robot can be trained to recognise and grasp objects. Impressive. But it can only recognise objects, using computer vision, if it has been trained on a database of known items. Some systems can generalise to never-before-seen examples, like a type of mug they haven’t actually been trained on. Still impressive. But now imagine you want the robot to stack your dishwasher but your home has some ornamental mugs that aren’t used for drinking and would be damaged by the dishwasher. How can the robot learn this idiosyncratic fact about the culture inside your home using only environmental data gleaned by its sensors? It can’t. It is physically impossible for a robot — or human — to extract the relevant data from the environment because the data does not exist in the environment.7 The data about an ornamental mug is found in a history of interactions between the humans living in that home.
I call this conventional data because it’s based on social conventions of behaviour and includes things like customs, norms, etiquette, trends, procedures, habits, etc. Roboticists call it “semantic” data and many assume it is simply a richer kind of information that a more sophisticated robot will one day be able to sense and process. Certainly, a robot could be built to sense such data. Humans do it, so it’s possible. But a robot will have to do it the old-fashioned way: being exposed to those conventions of behaviour, many of which will be specific to the culture and even household the robot is deployed in.
And conventions are in constant flux. Humans handle shifting trends and practices with ease. But we are continual, lifelong learners. So far, no robots (or the algorithms they’re equipped with) can update their weights based on live feedback — their weights are tuned-up in training and once they’re in the world the best they can do is learn a bit from the context window (like how ChatGPT can “remember” a few thousand words of new things you tell it, only to forget it next session). And when we learn, say, a new way to fold laundry, we don’t forget how to open a door. Incredibly, this is what neural networks do. It’s called catastrophic forgetting and for those like Meta and Tesla who pour billions into dishwasher-stacking robots, it has been financially catastrophic.
In any kind of AI takeover, then, robots will assume the role of clunky helpers, mere prostheses of the networked ASI calling the shots. The robot concerns I list above are therefore mainly derivative of advances in algorithms.
Nanotech and biotech
It’s unfashionable, but I’m worried more by the nanotech/biotech threats. The jury is out on intelligence, agency, coordination, and all of those other concepts for which we have only folksy intuitions. But self-replicating nanobots or synthetic bacteria gone-wild have a well understood proof-on-concept in existing biology.
Do they count as artificial intelligence? I think so. The intelligence part of AI is vague anyway, and the thing they definitely have is competence, which is what intelligence is meant to get you anyway. Tellingly, many of the most plausible doomsday scenarios caused by ASI have the ASI using nanotech or biotech to kill us.8
There is a subfield of synthetic biology dedicated to fabricating synthetic cells. (See this discussion on the state of the field.) Basically it’s very neat chemistry and nanotech stuff with human-made structures that have some of the properties of living cells, like an outer membrane and an interior where some synthetic proteins or nanomachines are housed. The near-term goal of these post-modern Prometheuses is to have a synthetic cell that can copy itself, occasionally mutate, and thereby enact Darwinian evolution.😱
These are the automata we should be talking about.
What they can’t get a synthetic cell to do yet is autopoiesis: the capturing of energy to self-maintain and re-produce. This is contrasted with allopoiesis, where a system produces other things, not itself. E.g. a car factory makes cars; it doesn’t make more car factories; so it’s allopoietic not autopoietic.
Currently, AIs are allopoietic at best. Some of them produce. But they need lots of external systems to repair them and help them re-produce. We need to be wary, though, of anything approaching autopoietic technology. Synthetic cells are one path.
Another is the old sci-fi staple, self-replicating nanobots. They’re scary because they might jump to some new niche that nothing else is prepared for (the extreme version is the ”grey goo” scenario). It might be a new way of metabolising or gobbling things up, which hasn’t been done by microbes yet, because of the physical limits of carbon, say. (Microbes have been “trying” to find a way to kill everything else for a few billion years; they almost succeeded once.) Or they might be something like the proposed mirror microbes: micro-organisms with a different chirality to normal life, i.e. right-handed amino acids instead of left-handed. Existing life might have no defences against these utterly novel players, who might be able to make moves inaccessible to existing life.
If this still sounds remote or farfetched, recall what you’re made of. Inside your cells are tiny little molecular assemblers called ribosomes. They’re fed an mRNA script and then they chain together the right sequence of amino acids needed to 3D print the order. The amino acids fold, origami-like, into a proteins, and proteins do just about everything in the human body or any other body. I myself was assembled by trillions of ribosomes (molecular assemblers) inside my cells (self-replicating micro-bots) all because of a runaway process started by an errant stretch of DNA (code) — no programming, no design, no free energy provided, no artificially cooled environment. So if carbon could do it the hard way, silicon and other materials can probs do it the easier way.
Cyborgs and enhanced humans
These would present a slow-burn doom for “normal” humans, with cyborgs and/or enhanced humans gradually replacing us. (Cyborgs, though, would first need to crack heredity, so their upgrades are passed on to their descendants.) It could happen over many generations through simple survival of the fittest, or a bit more rapidly because of any combination of domination, interbreeding, or genocide. Basically, whatever happened to the Neanderthals and Denisovans, that would be us.
Some prominent people think we need to upgrade ourselves in order to keep pace with the advanced AI we’re building: become cyborgs to stave off Skynet. Others might not even consider this a threat, because they want Homo sapiens to be replaced (nonviolently I’m sure) with a “better” species. Or they envision a more inclusive future where humans, cyborgs, and mutants all live and work together in a class- and caste-free society. I am not as sanguine. These augmented humans are basically us — and we can already kill us — but with upgrades to make killing us easier.
Generative AI can be dangerous, aiding fraud and deception, amplifying surveillance by governments and corporations, boosting the production of attention-capturing content.
Speculation: to understand a declarative point (chimps can’t; two-year-olds & dogs can) you need a form of situational awareness close to what we call consciousness. This doesn’t automatically confer “moral patienthood” (dubious concept). But if a robot sees the world as an arena filled with other agents with viewpoints (who also see an arena, in which pointing to something refers to it), they likely have a sense of events unfolding to themselves and others. This awareness (for us) brings along a sense of mattering, of there being better/worse ways things are. It also aids perception of affordances and cooperation. A thusly aware robot could cooperate better and do more harm.
Incidentally, here’s a great list of lists of catastrophic risks from Florian Jehn.
People with a different approach to probability, risk, the future, etc. will see this differently. I am insistent, though, that nuclear war has one signal advantage over prospective AI threats because nukes, their infrastructure, and their incentives for use, exist right now.
Without technological progress, there are still black swan events, e.g. natural disasters. They’re more spaced out. Prehistoric life was less surprising than modernity… I think.
This is the latest system card from Anthropic. Juicy stuff: pp. 25–9.
See this long, technical, pretty good paper of mine, still under review. Academia. Ugh. Here’s a recent robot demo video, a genre I sadly know too well. Another semi-autonomous humanoid with a complete lack of contextual understanding. And a robot.
See Yudkowsky’s list of lethalities.
This is the second piece on AI dilemmas I’ve read in 24 hours. Yours is very scientific. The other was a piece also on Substack by Amanda Guinzburg “Diabolus Ex Machina” , where she took screenshots of her attempt to work with Chat GPT. Downhill results all the way. But what struck me was Chat trying to defend its poor work habits and change the subject on her. Do I laugh or do I cry now?
Thanks Jamie, I love the emojis, they appealed to my dark sense of humour!