Let me digress immediately to talk about workplace toilets. They are often horrific. At my current workplace, it’s not unusual to find a toilet seat so utterly beshat that no amelioration with industrial chemicals will ever make me feel safe to use it again. And where, you might wonder, is this extremely rustic place that I work? Maximum security prison? Black market shipping interest? The Western Front in the winter of 1916? Nope, I am an academic researcher at a high-ranking Australian university.
When I try mentally to reconstruct the events that led to what I see in these toilet stalls, I can only envision postures and actions that are biomechanically implausible. Did the person really sit on the toilet, lean forward as much as possible, tuck their head between their knees, but then engage their quads, thereby lifting their buttocks off the seat the better to angle their fundament to the horizon, guaranteeing the excrement falls, not into the bowl and obscurity, but up and onto the rim and even — breathtakingly — the very seat of the commode, where it can live on and influence events in my faculty?
The other thing that runs through my mind, in high fidelity audio (I’m an inner speech person), is: “We’re not gonna make it are we?” It’s a line from Terminator 2: Judgement Day, a bleak-as-fuck masterpiece, you’ll agree, from James Cameron.
The film is also a cultural reference point for the AI apocalypse. In the Terminator universe, the Skynet system becomes self-aware and ungovernable, so its handlers try to shut it down. It retaliates by nuking us.1
Some of us still can’t use a toilet and none of us has solved the current alignment problem of nuclear weapons — namely that our leaders’ interests in nuclear posturing and maintaining mutually assured destruction, are pathologically, unacceptably unaligned with our interest in not dying in a nuclear winter. So whatever the precise threat of AI is, I worry over our ability to anticipate and handle it, given our current failure on nuclear weapons and our sorry efforts on climate change, come to think of it. Plus the toilet thing.
Last year, I finally started engaging properly with the debate over AI alignment. Amazingly, since March, it has also become part of my job. I find myself trying to master the topic in the very midst of an AI take-off. It is not an AGI (artificial general intelligence) takeoff, but it is certainly very different from any AI boom of the past.
My views have changed. I was excited by LLMs last year, which performed better than I expected. And, smugly, I felt they confirmed my views about language. (More on that in a later post.)
On alignment, I thought some of the assumptions of AI doomsters were mistaken. Combing through LessWrong posts and reading Superintelligence — carefully this time — I felt like the notions of agency, goals, motives, drives were bonkers.
I started drafting an academic paper and some blog posts. In doing so, I engaged with the work of AI sceptics, to see if anybody was already writing my critique.
Reading the case against AI alarmism made me reconsider. What I found was a cavalcade of comically bad arguments.
“In the 1950s people said AI was just around the corner, that didn’t happen, therefore current concerns are hype.”
“Engineers prioritise safety when designing other products that we routinely use, like aeroplanes and cars, therefore AI will be the same.”
“I tested GPT-4 on what I consider personally to be a key task for intelligence and it couldn’t do it. Therefore AI is nothing to worry about.”
“There have been concerns about other technologies in the past and they turned out alright, therefore this will be the same.”
Of course, this is how everyone argues all the time about everything, it’s not unusual in that sense. But these arguments are being made by philosophers, AI experts, and computer scientists. They are desperately weak.2
So I dug deeper into sceptical territory to find more nuanced reasons to doubt AI alarmists. Here I found arguments mainly about problems with the underlying epistemology, or approach to knowledge, of people saying that AI can become an existential threat. These arguments were backed by evidence, well-made, and often convincing. But I thought it was weird that they didn’t specifically engage with the concrete predictions of alarmists:
AGI might be closer than we think,
the take-off could be rapid,
it’s likely to be hostile to human interests,
it poses an existential risk.
These critiques focus on the alarmist’s worldview. The alarmist is a longtermist, or a Bayesian, or a tech enthusiast, or a rationalist, or a transhumanist, or they’ve never worked in tech so wouldn’t know about algorithms, or they used to work in tech so they’re probably embittered.
Maybe the alarmists do have worldview problems. That still ain’t an argument for why an AGI catastrophe can’t occur. And most of this is just ad hominem.
The final genre of criticism comes from people calling into question the premise of intelligent machines or the idea that human-like intelligence will ever be fund in silicon. This genre is the most rigorous — it uses evidence and even imports ideas from cognitive science, linguistics, ethology, etc. A common motif is:
Even the best AIs only appear to be intelligent. They have no world model, when they use language it has no meaning, and they lack any real understanding of reality. Ergo, they’re toothless.
Again, this might be true of current AIs, but it is not at all an argument against any of the dot points above.
This line of thought has a dead man walking feel to it. Like, at some point, it will be possible to have human-level intelligence in a machine or some other weird kind of intelligence that is utterly un-human-like but which is more powerful. Then debates over world models and human cognition are moot. At best, this kind of argument should be a warning not to instil a world model in AIs. And yet, tellingly, these particular critics don’t make that their warning. Instead, they say that AI alarmism is drawing attention away from real problems.
Oops
After reading all these critiques, I suddenly realised two things:
Even if the worldview underlying these dire predictions is mistaken, the concerns or risks don’t actually disappear, they just need to be evaluated in a better way (and the critic invariably fails to do this).
My own critique was very much of this form.
Specifically, I have some stuff about how “goals” are thought of in alignment discourse and some evolution-informed ideas about how systems need certain energy regimes to make things happen. I’ll write about that soon because I think it’s interesting and should affect our predictions. But I no longer see it as a reason to allay fear about AI risk.3
I take this as a reminder that even if some idea sounds weird or implausible, if the expert arguments against it are bad, it’s worth attention.
I’m used to considering my own view might be wrong. (That’s why I’ve written so little even about the things I’m qualified to.) But here I concede I was wrong in a particular way I try hard not to be, i.e. being a smarmy intellectual who writes off an entire discourse/discipline/group/issue because of some sophisticated and no doubt insufferable philosophical point.
I still think nukes are the more pressing issue. They’re an off-the-shelf proven-to-work method of killing virtually everyone and they are good to go, with only one step needed to execute the plan.
And I still think AI alarmism is based in science fiction — although that was never the criticism it sounds like. We can’t do accurate forecasting. And the technologies involved don’t yet exist. So it is necessarily science fiction.
That’s not a problem, except from a PR point of view where it might backfire as people don’t take you seriously. But it might also work by offering vivid fictional scenarios for a dystopian future, around which people can rally. Or people might just persist in using their go-to cultural references like Terminator 2.
Point is, when you’re dealing with unprecedented events, risks that have no historical examples, where you can’t learn from the past and gradually update your models, you have to use imagination — fiction.
Other commentators’ views are also science fiction. Like the millenarians describing all the wonderful benefits brought by an AGI that can help us with worldly problems. It can end world hunger, reverse climate change, kill death, and even solve its own alignment problem. Sounds like a space opera with lazy world building.
And the chillers who think we needn’t worry because AGI can’t be a thing, also need speculative fiction. The future for them is one where hard metaphysical facts are found to fix the limits of technological advancement: no sentience without embodiment, no intelligence without a special understanding found only in the human mind, no will to power without the knowledge of biological mortality. Sounds like fantasy.
Should we worry about the apocalypse?
The case against AI being an existential risk is gossamer-thin and soaked in wishful thinking. I don’t have space here to even summarise the positive case.
The alarmist’s basic point is that the space of possible minds is probably mainly insane. And the space of possible outcomes following the advent of an AGI is probably mainly negative: hell is easier than heaven. (The view is put forward most elaborately by Hell’s-Easier Yudkowsky.)
So, sweeping away all the sci-fi and all the forecasts; sidestepping the philosophical underpinnings of goals, intelligence, consciousness, alignment; ignoring the fallacious arguments against alarmism — I now think this:
We are currently investing billions of dollars into trying to create something that will be smarter than us, more powerful, and necessarily unpredictable. In other words, we are paying to dig our own grave.
It doesn’t matter if we’re talking about an AGI, an all powerful institution, a genetically engineered race of super-soldiers, or an advanced alien civilisation. To deliberately engineer a confrontation with something far more powerful than ourselves is suicide.
AGI might be impossible for a long while. But we’re trying to make it happen, we’re building the incentive structure to hasten it, we have the best people on it, and there’s a lot of money to be made along the way.4
A while ago I tweeted this. I can’t embed the Tweet because of an ancient feud, but the text is this:
“When we build a machine smarter than ourselves it will build a machine smarter still…”
How fucking dumb would it have to be to build a machine that will supersede it, oh.
Are we going to make it?
We’ll likely be hit soon by non-apocalyptic AI problems: a typhoon of cybercrime, electoral interference, fake news, job losses, sludge content, and other tropical shitstorms caused by increasingly agile AI systems. On top of that is the ongoing pillaging of our personal data for the enrichment and empowerment of the big tech 5, now abetted by LLMs.
And I don’t think current AI paradigms are actually capable of what a layperson would call agency, goals, or actions. So we have some(?) amount of time to slap ourselves in the face and sober up.
But the way people use toilets is not encouraging, as a general indicator of incompetence. Presumably these scoundrels don’t treat their own toilet like this, so there’s a tragedy of the commons aspect to it. But there’s also a singular ineptitude here, a failure to perform even the most basic functions of the organism. For this reason — and I really mean this — I have no confidence in people skillfully managing unprecedented and complex risks. It is absolutely possible for us to do it and some people are able to prescind from current circumstances or lazy analogies from the past. But there’s no reason to assume we inevitably will tackle this problem, other than wishful thinking.
I’m on this for next year at least. I have no idea where my thinking will be at the end of it.
Next post, I will offer an analogy to think about AI alignment drawn not from science fiction, but history. It seems to me that the debate is not between luddites and techno-utopians, nor optimists and pessimists; it is between Churchills and Chamberlains. And right now I see a lot of good people being total Chamberlains.
As Tom Chivers notes in his great book The AI Does Not Want to Harm You, people who don’t work in AI alignment immediately mention Skynet when prompted by the notion of rogue AGI.
Examples are endless. But the most efficient way to canvass bad arguments is to follow Yann LeCun on Twitter. That guy is definitely way smarter than me and he’s a pioneer in several huge tech advances. He might even be a swell guy, all-round legend, great father, kind mentor. But he is a parodically bad arguer. He stands out — on Twitter mind you — as a Bad Take merchant.
It’s a bit like becoming sceptical of aspects of climate science, such as the accuracy of long range forecasting. This has typically made me more concerned about catastrophic climate change, because I doubt our ability to marshal all the sciences and our general inability to forecast. So the future could be worse in ways that are not even imagined by the climate models.
One thing I found damning. I can find virtually nothing in the literature from security and strategic studies about AGI or alignment. There are some recent articles about AI on the battlefield, drones, etc. but nothing to do with AGI or even LLMs. This worries me. If I worked in that field this would be an obsession. I did see some intel agencies are hiring AI specialists right now, so I assume there is just a lag. But a lag of like eight years since it has been a well-known issue and one whole year since the public release of mass-use products suggests to me that “the authorities” are slow, dim, and, for now, ill-equipped.
"Maybe the alarmists do have worldview problems. That still ain’t an argument for why an AGI catastrophe can’t occur." <-- This is a very important point. I think most of the alarmists do have worldview problems, but they also seem to understand the technical problems much better than the non-alarmists. To acknowledge that fact isn't too endorse the alarmists' worldview.
Thank you for this, the best article I've read on such a complex topic. I certainly did not expect to laugh so loudly whilst contemplating the demise of our civilisation. Unfortunately though, I'll never unsee the visual of the toilet position, hehe :))