Don't trust machines unless they feel shame

And why you can't trust psychopaths or animals — except dogs

Dec 02, 2024

Last month I was at the Australian AI Safety Forum. There was a lot of talk about how to build trustworthy AI. “Trustworthy” sounds anthropomorphic to me. Do we speak of trustworthy bridges, toasters, and spreadsheets? Kind of. But when I “trust” a bridge not to collapse I’m really trusting the engineers to have designed it properly and, more abstractly, I’m trusting the people in the regulatory system who ensure high standards.

I think that if you’re in a relationship with a Roomba, a Tesla, or ChatGPT in which you feel like you can trust it because of its tenuously human-esque qualities, then you might want to examine all your relationships, because you might be someone highly vulnerable to exploitation.

Sure, a hypothetical AI that really did behave like a human should be treated like a human: endowed with the rights of a legal person, held responsible by the law, and prevented from accumulating too much power. Such an AI would be eligible to win (or lose) your trust. Until we have that (we might never have that) we should treat autonomous systems the same way we treat non-autonomous systems: as machines, tech, artefacts.

We don’t even need to find an intermediate category that blurs these boundaries. We have existing intermediate categories for not quite full legal persons already: certain nonhuman animals, children, persons of diminished responsibility, etc. Currently, AI is not even anything like these.

But one day it might be.

This forces us to nail down, ahead of time, what it is that makes a full legal person different to a robot, a chimp, a child, or a patient in a coma.

And I don’t think high intelligence or agency is sufficient. Because we already have some highly intelligent agents who nonetheless cannot be trusted like full legal persons — psychopaths.1

The legal system is onto something

There’s no reason to suppose that modern legal systems, which evolved from medieval conventions, would arrive at the best theory of what it is that makes a person biologically different to an ant or a shrub. Neither would we expect the law to say much about the prospect of machines that can think and act. But it comes reasonably close. Which I’ll address in a moment.

First, where the legal system errs is in conflating consciousness with legal responsibility.

A sleepwalker, for example, is deemed not guilty of their crimes because they lack consciousness. Volition and mens rea are thought to be products only of conscious cognition. This lags behind research into unconscious cognition (basically all our cognition) and unconscious decision-making.

But the legal system is savvy and prescient in its treatment of pure psychopathy.

Bereft of the social emotions of compassion, guilt, and shame,2 the psychopath has no good reason to refrain from harming someone who gets in the way of their selfish aims — except a rational fear of being caught and punished. In this, the psychopath is akin to the tiger (or even the paperclip maximiser). In some cases, psychopaths are treated as criminally insane and therefore not fully responsible for their crimes.

I’m sympathetic to this view. It might seem outrageous that a perpetrator of callous violence would be treated leniently. But I see a crucial difference between a normal person and a psychopath. The psychopath is impaired. They cannot feel remorse for committing an act that harms others.3 This renders them unable to punish themselves and immune to some of the factors that stop the rest of us from harming others.

Hey friend, thanks for reading The Stark Way.

Back to agency. Agency is often correlated with consciousness, but it is not caused by it. A sleepwalker can be an agent. Like an advanced robot, they’re a flexible decision-makers and can protect their own interests while being unconscious. Sleepwalkers should, actually, be a little more liable than they are for their crimes.4

A psychopath, meanwhile, is presumably both conscious and agential yet is perhaps less responsible, less of a moral agent, because they are deaf to the feelings that guide prosocial behaviour. The psychopath is a problem for the law and politics.

Coevolution with psychopaths

One way to write human history is as a series of adaptations to manage psychopaths.

Once, we were like baboons. The strongest male triumphed through violence and left more descendants. These alpha males had no reason to be generous or to sacrifice anything for anyone else.

Then came language and throwing. Now others could conspire (language) to kill from a distance (throwing) any tyrannical alpha males. This altered the selective landscape, favouring people who can form coalitions. Bad day for psychopaths, good day for team players. My prediction is that even in hunter-gather societies today, it is hard to survive and reproduce as a psychopath.

The Bronze Age returned the psychopath to power. Warlords and chieftains rose to power, the more pitiless the better. Genghis Khan left many descendants.

Belatedly, civil society found ways not eliminate psychopaths from government (see reality) but to dilute their power. Democracy is one such instrument. It’s blunt. It stops anyone from harnessing too much power, including altruists. But at least the fallout from having a psychopath in charge is contained. In my country, two of our last six prime ministers have been bona fide psychopaths, and they did relatively little damage.

Naturally, a shrewd psychopath will find other systems and hierarchies within modern society where their ruthlessness is well hidden or rewarded. They become surgeons, CEOs, media personalities or, most frequently in my experience, real estate agents.5

A psychopath has some advantages. They have fewer considerations to weigh in decision-making. They simply do whatever benefits them, without worrying about others’ interests. Although, when dealing with others who do care about others’ interests, this lack of insight might be a handicap. And although psychopaths are excluded from many cooperative opportunities, they’re able to exploit others without feeling bad. They can freeride off all the institutions in society built on normal people’s propensity to trust one another.

Nonetheless, I have empathy, ironically, for the psychopaths. Imagine being born into the world and having to start playing a game where it is obvious everyone else is different. Like any creature, the juvenile psychopath does what they can to survive in a hazardous world. Cat-like, they look out for number one. But there’s a difference. A lion, cast out from its pride, can secure enough calories to live on its own. It might not reproduce, but it can hunt and kill on the savannah. Its body is the optimum vehicle for doing so. Humans are different. Not only are we born premature and require coddling for years, it’s not until we’re young adults that we bring in our share of calories to the tribe. Until then, we’re subsidised by older members. And most of the calories we do procure are in concert with others. Ostracism is a death sentence. The psychopath must be subtle, or they’ll find themselves surrounded one day by a circle of the people they have exploited. These accusers, who apparently experience alien feelings of shame and compassion, also evolved a smouldering sense of justice and will go out of their way to punish cheaters. They will stone the sociopath to death and feel righteous doing so.6

The key point here is that normies trust one another over and above simple reciprocity because of that extra incentive to cooperate provided by the social emotions (shame, guilt, compassion).7 It’s more than just you scratch my back, I’ll scratch yours. Baboons do that. Because you feel guilt, dear non-psychopathic reader, you’re just a little more likely to scratch my back even if you’re uncertain I’ll reciprocate.

Why aren’t we all psychopaths?

Mainly it’s because psychopaths miss out on lots of back scratches.

Still, where is the advantage to actually feeling bad? Why does guilt cause us negative affect, and even bodily pain, rather than a simple cognitive awareness of having done wrong? Couldn’t we have the propensity to cooperate and not screw people without the sinking feeling in the stomach?

Here’s the stark evolutionary reasoning. It makes genetic sense to help your identical twin, and to a lesser extent your sibling, and to an even lesser extent your cousin, because if they survive and reproduce they’re passing on a lot of your genes because you have genes in common. When helping non-kin, it’s a lottery ticket. You’re hoping that in some indirect way, aiding them will lead to conditions that somehow work out well for you passing on your genes (unlike relatives, they can’t pass on your genes for you).

This bet isn’t worth it. So while bees will gamble their life to further their sisters’ genes, they won’t do anything for a bee from another hive.

Humans occupy a strange niche. We need to cooperate throughout a long lifetime. We need both temporary alliances for a single mission and lifelong bonds for intergenerational families and child rearing. We need to cooperate with randoms for a stag hunt and a spouse who is often not even our cousin.

How can we trust these people not to renege or betray us? We can’t. We can never be sure. What we need, what evolution always calls for, is a hard-to-fake signal that we can use to judge someone’s trustworthiness with pretty high confidence. The textbook example of a costly signal is the peacock’s tail indicating good health: you can’t fake a resplendent appendage that requires good genes, lots of calories, and strong immunity. The other classic is the gazelle’s “stotting” behaviour: if you can gratuitously prance while fleeing, you must actually be super agile and fast, thus signalling to the lion to ignore you and chase some other, less athletic gazelle instead.

I believe that emotions like guilt, grief, shame, and sympathetic hurt are our hard-to-fake signal. All of these emotions are risky because they can be hijacked. They’re beyond self-control. They result from others: either how others feel, or events that befall them. It is crazy that our nervous systems and endocrine systems made themselves vulnerable to impacts beyond our own bodies, i.e. the inferred feelings of other people’s bodies.

How do you show that you have something on the line for a cooperative venture? It can’t be that you stake your life on it; that’s impractical for an everyday task that nonetheless requires the parties to sacrifice something. How do you demonstrate skin in the game if it’s not a life and death exercise?

You can demonstrate that you’ll experience nonfatal harm if it goes wrong, or if you exploit the other person. There are outward signs like crying, grimacing, swearing, and apologising. These are, to put it mildly, easy to fake.

The only way to honestly signal that you’re hurting nonfatally is to do it for real, by feeling bad. But you can’t prove what you’re feeling inside. And so, something extraordinary happened. We evolved an inner space, still entirely private, but at the mercy of social harm. Each of us contains a little enclave of others, who maintain an embassy in our hearts. The others cannot see it. But you can prove it’s there by going out of your way to help even when you won’t be punished for not helping, i.e. conscientiousness or altruism.

This sounds fakable. Couldn’t a psychopath do it? Couldn’t they start helping people but not really feel anything about it? But they won’t. Because for them there’s no emotional reward in going above and beyond. There’s arguably no reward for us either — although we often feel pride, or solidarity after helping. Crucially, however, there’s a punishment (guilt) for not doing it. The psychopath doesn’t have that. They do only the bare minimum to guarantee reciprocity. Given a chance to flake, they will.

This whole scenario is weird. We need to honestly signal that we’re trustworthy cooperators and so we evolved new emotions based on others’ feelings. And yet merely having these emotions isn’t a signal that others can perceive. So their role is simply to motivate us to actually do nice things, the kinds of things that wouldn’t get done by an untrustworthy person. We could fake having these emotions and say that we feel bad. But that does no good because then we’d be likely to ghost, flake, or welch because we won’t feel bad whenever we actually do it. What’s more, because these feelings are real and because people generally do at least a little bit to help each other and refrain from cheating, you can rely on people being generally trustworthy, within certain bounds. Other animals can’t do this with each other. Humans have always depended — a little — on the kindness of strangers.

This can go catastrophically wrong. I’m not only talking about the obvious fact that a small percentage of the population, the psychopaths, are parasitic upon this norm of good faith. There’s also the mental health risks of being open to others’ feelings. Major depression is common. One can even be so hurt by others’ disapproval or social banishment that one commits suicide. Other animals don’t have these problems.

So the upside must be goddamned huge for it to carry such obvious downside risks.

The rewards are indeed world-changing. The new ability to form a community that divides labour, keeps promises, defends one another, pools calories, cares for infants, and in which any two or three individuals can — if they get their shit together and make a plan — hunt any animal on the planet, instantly propelled these neurotic hominins to the top of the food chain.

(Disclaimer. Obviously I know people often aren’t trustworthy, they have to build your trust over time as you gradually do things for one another. And I know most people are only conditional cooperators: if they find themselves in a high-trust society, they blow with the wind and follow the road rules, pay taxes, and don’t shoplift. If they find themselves in a low-trust society, they blow with the wind and take what they can get. I know all this and I’m far from an ingenuous Pollyanna. But the fact that humans are capable of trust at all is the thing that needs explaining, especially if we think machines can be trusted.)

Don’t trust machines

This has mainly been about why psychopaths can’t be trusted. But hidden within the psychopath’s advantages are the reasons why the next wave of robots and algorithms cannot be trusted.

Any general decision-making system that acts in the to-and-fro of human affairs will have to cooperate with various humans. They will have to make decisions that are sub-optimal from some humans’ POVs but not others’, decisions that conflict with their company’s interests, decisions that violate their own interests. They’ll need to act when it’s not even clear whose interest is at stake or how everyone will be affected.

Humans cope with this confusion with strong inner critics who keep us broadly in line with our community’s interests, as though they keep within us a right to veto some of our actions (and this can also paralyse us with indecision). People can also make terrible decisions and it’s sometimes unclear whether they could have done better. But we accept this imperfect state of affairs because people can be held responsible, because they have feelings; when punished, they experience remorse and we experience a feeling of justice. A person who makes, on average, “better” decisions but who cannot be held responsible because they have no guilt, is not a better person. Likewise a robot who makes rational decisions but whom we cannot punish (because there’s no point, they won’t feel bad) is not a person at all. We would instead hold their maker responsible. As a counterpart, we would never put any trust in the robot.

Any future machine that is a free-range decision-maker and cooperator, will need to have instilled within it feelings of guilt and shame or some equivalent. There is no other way to act in the uncertain and largely incomputable domain of human social relations with all their overlapping, intransitive, and opaque interests. Humans won’t allow it.

Having a domestic robot — the pipedream of most robotics companies — that lacks feelings of guilt, would be as dangerous as leaving your children with a reliable psychopath or sleepwalker. They might be competent. They might have a selfish reason not to fail. But they cannot be responsible. If someone asks you to “trust” a machine, don’t trust the machine and don’t trust that person.

Bonus rant about pets

WARNING: cat-people will just hate this.

In one of our sicker moments, we bred into dogs a human sense of guilt. Dogs feel bad when they disappoint us (more so than when they disappoint each other… I think. I’m not a dogologist). Some other domesticated animals, like horses, are perhaps on the way to such interspecies chagrin. But only dogs truly hurt when they let us down. And hence, of all the animals, only the beloved hound will actually risk its own life for its human friend. You can sometimes trust — with all the anthropocentric baggage of that word — your dog.

In a timeless irony, the next most intimately welcomed creature is the antithesis. Like most animals, cats care not whether we live or die. Being mammals, they enjoy curling up with other warmbloods who aren’t threats. And, like all predators, cats are pure psychopaths. Yet they go further into needless sadism, often toying with their prey.

Most birds or mammals will become affectionate if you feed them and build a rapport, thereby proving you’re not hostile. But this doesn’t mean a mouse or magpie will behave altruistically towards you — not in the evolutionary sense that it would for its young, for example. This makes them “sociopathic” by human standards. If you feed and cuddle a human sociopath they’ll generally be affectionate. They’ll even physically protect their own children. But you should never ever expect them to go out of their way to help you or give you something for nothing.

Cats are affectionate after you provide them free room and board. But they also ceaselessly hunt native birds, for fun, and if they find mice, they paw them and cripple them, and often leave them to die in pain without even consuming them.

Here’s a classic piece of cat apologetics:

The behavior [toying with prey] is often misunderstood by people not familiar with cats. It is not sadism or torture, but a survival mechanism meant to exhaust the prey animal so that it can be quickly killed with minimal chance of danger to the cat.

This is applesauce. It wouldn’t pass muster in high school biology. Neither do the other hypotheses usually put forward.

Hypothesis: Cats toy with their prey because they’re learning how to hunt.

Why it’s bullshit: Then the behaviour would only be evinced by juveniles.

Hypothesis: They’re making sure their prey is fresh.

Why it’s bullshit: Seeing the prey move once is confirmation it’s alive and not rotten, which is good enough for other predators.

Real reason: Cats do it because they thrill from inflicting pain on small animals. They get off on the hunt, and so they prolong the moment, edging themselves like furry little Jeffrey Dahmers.

I’m not blaming them. They evolved to do this, just like orcas. It’s a byproduct of the motivational system to hunt.8 Most predators don’t do this and it seems wasteful. Perhaps, being atop their food chains, orcas and felines can afford to indulge this evolutionary excess. It’s one of those rare bits of pure pleasure found in the animal kingdom, like when cows eat psychedelic mushrooms.

In a final insult to all that is good in this world, all that is based in compassion, trust, and human feeling, house cats secure their unlikely parasitic relationship to us by infecting us with actual parasites that change our disposition towards cats. Why else would we invite the Hannibal Lectors of the animal world into our private sanctums? Because although we are trustworthy and trusting, most of us wouldn’t go out of our way to daily care for an unfeeling narcissist. To get us over the line, cats go one step further than human psychopaths and ingratiate themselves by infecting us with Toxoplasma gondii.

I know cats are elegant. They are resourceful creatures with beautiful eyes, and astounding physical prowess. And after we’ve simped enough for them they even let us pat them and they emit a pleasing and defence-lowering purr. But now that my own toxoplasmosis has warn off (I lived with a cat during COVID), I feel like I’m wearing the glasses in They Live and it is again clear to me that of all mammals they are the most unlikely pets. They can never be trusted. And to some extent their human vassals — “cat people” — should also be looked at askance.

The terms sociopath and psychopath are somewhat interchangeable but neither is used as a clinical diagnosis by psychologists. But criminologists, TV writers, and moral philosophers like these words. Surely we do need a label for that type of person, whom we’ve all met, who simply has no remorse, no empathy, and generally a high tolerance of risk and a delight in revenge. Psychologists talk about antisocial personality disorder or Category B disorders or even the dark tetrad, which certainly sounds cool. I’d be surprised if these designations were more valid. I think the jury is out on much of personality psychology and especially areas like this which are hard to study for ethical reasons and because the people I’m talking about rarely self-identify. Those who end up in the criminal justice system may be identified. But your office psychopath or real estate agent is unlikely to get a psychological assessment.

Although they may well feel pride, the other big social emotion. That doesn’t help my argument.

To be clear, I think psychopaths who commit bad crimes should be locked up. Partly for deterrence and to remove dangerous people from society (they respond to incentives, so the law still works on most of them). But they can’t be held “responsible” or punished in exactly the same way as normies, because they’re missing the internal penalties imposed by normal people, that make cooperation and altruism possible.

This requires experiments to test if they have social emotions. If sleepwalkers are effectively psychopaths, then they’re closer to a nonhuman animal or automaton, i.e. they have flexible behaviour but can’t be responsible because they have no social emotions, which makes them dangerous.

And at least one is running a prominent tech company.

It’s uncomfortable for me to recognise that while I’m such a bleeding heart that I muster empathy for psychopaths, I struggle to feel for the officious types and busybodies who enforce rules, sacrificing their own time and resources just to make sure someone else doesn’t get away with something. When I see this I see Auschwitz. What frightens me is that, unlike the somewhat indifferent and purely selfish psychopath, the busybody gets off on punishing others. They feel a surge of oxytocin and probably a sexual frisson from not letting me pick up my package at the post office because of some minor clerical point. And yet, during the Pleistocene, these Karens kept the whole show running by exercising their zeal for rule enforcement and doubtless kept the psychopaths in check.

I inadvertently reinvented something similar to Robert Frank’s theory of emotions as ways of solving commitment problems (which I vaguely read about over the years). I still haven’t read Passions Within Reason (1988), but I flipped through it and he only mentions psychopathy briefly. The story I present here is fanciful, but it does fit with the general and respected idea that emotions evolved partly to direct prolonged behaviour when direct stimuli are not available. You feel fear, for example, so that you keep fleeing even after you can no longer actually see the predator. More complex emotions, like guilt, allow you to make decisions over long time frames that need to incorporate memories and simulations of other people’s behaviours and intent.

I bet people think it’s wrong to morally judge an animal’s behaviour. But think about why that is. Is it because you think morals only apply to those who have human-like feelings of guilt and remorse? That means you can’t morally judge psychopaths for murdering people.

The Stark Way