AI alignment is impossible

We should aim for control instead

Jun 27, 2023

Image generated by me using Midjourney. Apparently AI can’t do ants.

I’m humbled by a recent flood of new subscribers — welcome! It’s AI again this week. Next week I’ll return to bookish matters: my five year experiment not reading anything by white men.

The balance of power between AIs and humans in the future — perhaps near, perhaps far — is presumed to come down to whether our values align with theirs. This is the alignment problem.

But given the vagueness of values, this is a terrible way to analyse the problem.

Like, what are values? What are our values? Are our values currently aligned with each other? Are our long and short term values aligned internally? Do our agreements generally hold? Are values ever compromised? Do people ever lie or defect? Do corporations ever act badly? Are governments ever corrupt? Are incentives ever warped? Do plans go astray?

The prospect of needing to align an AGI’s values with our own is currently science fiction and the idea that it will be straightforward is fantasy.

Can we think about this strategically?

Game theory — you know, the prisoner’s dilemma and that type of thing — is a better way to think about potential conflict between parties1 But I wonder if game theory, while more rigorous, is also misleading. Game theory works well for scenarios where the players are competing for limited resources. It also works for modelling scenarios where cooperation can arise, given certain conditions.

But it doesn’t really work for scenarios without cooperation or meaningful competition. I mean scenarios where one player is vastly more powerful than the other. In highly lopsided scenarios, the superior player has no incentive to cooperate, and has so little downside to squishing the inferior player that they might as well do it and harvest any resources that are unlocked. These scenarios aren’t really “games”. For the weak, there are no strategies to play and for the strong, their strategies are independent of the actions of the weak2

The game theoretic analysis of me versus an ant is stark and salutary (hopefully like this Substack). A single ant who enters into some competition with me, like trying to eat some of my food at a picnic, is doomed. Switching to alternative strategies won’t work. The only way to survive to future game theoretic scenarios is to not play with me. It so happens that I’m an animal lover and that includes ants so I’m given to gently brushing ants off my food rather than squishing them. But the next human the ant interacts with might be a squisher. The crucial point is that whether or not the human is a squisher, has nothing to do with the ant’s behaviour. The ant cannot play any strategy that will stop a squisher when it meets one3

Likewise, there’s no “game” between the bombardier on the Enola Gay and the people of Hiroshima or between the dodo and some hungry Dutch sailers.

If at some time in the future we create a serious AGI, we should also have significantly transformed ourselves into, I don’t know, transhumanist cyborgs or something more powerful. Otherwise we’ll be creating — by design — an entity that could crush us like ants if it felt we were in competition for its resources.

“But what if it’s aligned?” I hear Yann LeCun*, chief scientist at Meta, asking.

Here I part company from existing alignment wisdom. If it is aligned then it’s not an AGI.

The reason is… well, I’m still trying to figure out how this works conceptually, but it goes a little something like this:

The only way to align values, even imperfectly, is by having interests in common. With some shared destiny, both parties are incentivised to row in the same direction. This is the essence of a coordination game and the wellspring of cooperation and symbiosis in the natural world.
But a sufficiently flexible, competent, self-sustaining entity, one deserving of the name AGI, will by definition not need us to further its interests. It can take care of itself.
In fact, if it is interested in any of the same resources as us (like the supply of power or compute) then it will also have a stake in downsizing the human population.
The only safe way to merge our interests would be to merge our selves. If we literally incorporated the AGI, as in some cyborg or brain-computer interface scenario, then we’d avoid conflict. But that still wouldn’t be alignment with AGI — we would have become the AGI.

I’m making certain assumptions here about humans, goals, evolution, competition. I might just not comprehend alignment in some important way.

Still, I have misgivings. Even if alignment is possible, it could just be damned hard.

It’s hard to get two human beings to align. Even a married couple with children, who have genetic and financial interests in common, as well as values and culture and personal history, and a legal contract committing them one to another and social pressure from a community to remain bonded — even in such cases it’s not unheard of for alliances to break down, often to the point where former spouses are so unaligned that courts intervene and spiteful hatred consumes. Aligning two intelligent agents is no foregone conclusion, even when the most well-tested procedures are used.

As a general rule, I’d suggest that it’s easiest to get alignment when the players have comparable power. Aligning the powerful to the powerless is an historical struggle.

Go for control instead — even that’s hard

Materially, life in a developed place is pretty damn good. There are supermarkets and audiobooks. You’re allowed to be gay. And clean water flows endlessly from taps within one’s home. I’m not a modernity hater.

But it’s embarrassing how little control we have over the people who control our world. It’s not that we should have positive control over our world as a mass population. Democracy of that sort is infeasible. But we at least want to be able to negatively control, to stop people who might ruin things. Representative democracy is meant to grant that kind of control.

But right now, an ordinary schlemiel like me, even if I team up with all my friends, can do nothing to stop a Russian general from starting a nuclear war. Nor can we stop the fossil fuel companies from capturing the interests of parliament with their lobbyists and donations. This is what is unacceptable. These are the great shandas, the disgraces, of contemporary life. The conditions of life grow ever more wonderful even as we lose our grip on keeping them that way.

But there’s no law of nature preventing us from galvanising collective action, renewing democratic institutions, and finding better ways of curbing excess power. Climate change activists do sometimes win.

With the large AI companies, we have a two-layered problem of control. A few people in a few corporations may well be accelerating the confrontation with AGI. The very reason we fear AGI is because we cannot control it. Meanwhile we cannot control OpenAI or Anthropic, Meta or Alphabet.

So the reason we shouldn’t make an AGI is the same reason we might end up doing it: we can’t align the values of those more powerful than ourselves. We need to settle for control. At present, we don’t have much of that either.4

I commend the work of Jesse Clifton at the Centre on Long-Term Risk for seriously studying this. Those people know their stuff and I’m probably wrong about everything.

Not strictly true. The weak agent should still play whatever will give the best payoff. But if the payoffs are things like, die today vs die tomorrow, it’s cold comfort to be playing the better strategy. There are games called Stackelberg games, where there’s a leader agent who gets to act first and follower agents who then respond. But this still doesn’t capture outright squishings.

This is the benevolent dictator problem again, which I increasingly favour as the best metaphor for a (at this stage) purely speculative advanced AGI of the future. Once you create the office of dictator (over which you have no control) you just have to hope the dictator is benevolent. But even if they are, the next one might not be and you don’t control that either.

This is part of a much bigger issue of governance, democracy, freedom, etc. I recommend Shoshana Zuboff’s extraordinarily good book, The Age of Surveillance Capitalism. It’s about social media and the use of our data; it pre-dates the latest AI but is totally relevant. I also recently watched this presentation from the guys who made The Social Dilemma documentary: it’s a good primer on some of the non-existential threats of AI.

Remmelt

Mar 21, 2024Edited

> AGI, will by definition not need us to further its interests. It can take care of itself.

Well-put. You may be also interested in this technical essay, which gets into how AGI’s physical needs are in conflict with our human needs. And how control of AGI is fundamentally limited.

https://www.lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable#How_to_define__stays_safe__

Expand full comment

Glen Anderson

One, if not the only article about the future of AI I've read that I actually "align" with. You've raised the exact points I've been discussing with everyone in my small circle of friends.

Personally I fear a Jerry Springer type of AI will become our new "leader". Our generational and political warfare tactics are all of the signs I need to witness to fully understand where our future is headed, sadly.

1 reply by Jamie Freestone

1 more comment...

The Stark Way

Discussion about this post