AI and the control problem

The most fascinating, and probably scariest, dilemma around our development of artificial intelligence is in my mind what is known as the control problem. In short, it revolves around how we humans will be able to control an AI that is more intelligent than ourselves. I first stumbled upon it when I read the Swedish/British philosopher Nick Bostrom’s brilliant book Superintelligence which I thoroughly recommend everyone to read.

Within the field of literature and AI research, you normally distinguish between conventional AI technology and our attempts to create AI that resembles our human mind. Conventional AI, or machine learning, is normally highly specialized, meaning that the computer can do a particular task very well, but is usually completely useless for anything else. You probably struggle to beat the chess program on your computer in chess, but you’d be likely to win if you challenge it to a game of poker. Your Roomba is awesome for keeping the floor clean, but it can’t help you with the laundry. In this context, conventional AI is thus referred to as narrow AI as an antonym to general AI (AGI) which is the term normally used to describe AI technology that is designed to, like us, be good at a multitude of different tasks.

In his book, Nick Bostrom explores several paths to achieving a super human level of intelligence, aka superintelligence. While there are other routes, the most probable and quickest one appears to be the AGI route. While we can achieve superintelligence by improving our own genome or merge ourselves with the computers, the roadmap is vague, and the expected pace of improvement simply cannot compete with a pure software based development of AGI and it’s ability to self-improve.

But the fundamental question then, is how we humans can be certain that our superintelligent computer doesn’t start causing trouble after we’ve activated it. And – if it does – how we can stop it. Hence the name of the dilemma: the control problem.

Harder than you think

At a first glance, the challenge may seem almost trivial. If it becomes dangerous, couldn’t we just turn it off? Or cut the power?

I’ll answer that objection in a bit, but before doing that, I’ll need to explain what needs to be in place for a machine to even be able to become superintelligent. First and foremost, we need some kind of definition of intelligence. As surprising as it may seem, there’s still very much a lack of consensus around the definition of intelligence, probably because we still have a pretty poor understanding of our own abilities and our conscience. It’s simply quite hard to create a definition that encompasses everything we refer to as intelligence in our everyday language and everyone thus can agree with.

I’m fond of the definition used by the Swedish physicists Max Tegmark (which you can read more about in his book Life 3.0). He defines intelligence in this context as ”the ability to achieve complex goals”. When you’re discussing AI, it’s really quite irrelevant how it would score on an IQ or EQ test, or whatever other scale you prefer to measure human intelligence by. The most important aspect of AI is its ability to achieve varying goals and find strategies to solve even complex problems. As an example, it’s quite improbable that the neural net behind Google Translate actually understands what we ask it to translate, but for the purpose of translation, all that matters is that it gets it right.

Now imagine that you’re building you very own AGI and that you have all the tools and skills necessary to create a working solution. For your AGI to actually do anything at all, it needs some kind of goal to pursue, something that motivates it do stuff. Your PC or smartphone doesn’t have this, it’s simply waiting for your next command and then trying to satisfy that as soon as it can. The autopilot in your car doesn’t have any higher purpose for why it’s driving, it’s just fulfilling preprogrammed objectives of staying within the lanemarks until you ask it to stop.

But when you’re designing you AGI, you would likely be interested in realising the true potential of the superintelligent AI. And to do that, you’d have to give it freedom to act more independently. Wouldn’t it be nice if Siri or Alexa could see the context and proactively add stuff to your shopping list when you’re running out of cereals? Or even better, what if it independently took care of the entire shopping business and you could rely on it to never run out of stock?

In order for it to act proactively, it needs some kind of definition of what it is that it’s trying to achieve. A goal, or a motivation to strive towards an ideal situation that it can try to optimise for. If the goal is too simple, like calculating 2+2, it will simply do that and remain passive afterwards. In other words, the goal need to somehow be recursive, something that the machine can optimise for but never actually get done with.

The brilliance of common sense

At this point, it might seem tempting to give your AGI a goal that is appealing. Like ”do good” or ”make people happy”. At a first glance, these objectives seem highly desirable. I mean, who could disagree with them?

As Nick Bostrom shows in his book, objectives like these are however likely to turn on us because of their ambiguity. What do we mean with happiness? In the absence of a clear definition, your AGI would have to invent its own. Depending on perspective, there are many ways to describe happiness, but ultimately, it all boils down to the release of chemical substances in your brain that stimulates certain patterns of electric impulses. A good way of achieving this is to lead a meaningful and adventurous life in a loving context, but you could just as well trigger the same emotional responses in a more shortsighted way by using drugs.

If the mission for your AGI is to make as many people as possible happy, but you haven’t defined clear rules around how it should be achieved or how happiness should be defined, there’s certainly a risk that the machine, by crunching our own research, would conclude that the optimal solution would be to directly stimulate as many people as possible by hooking up to their brains and trigger happy emotional responses.

It would certainly be more efficient to use direct stimulation or drugs than finding ways to give people a meaningful life, so the best use of the resources at the AGI’s disposal would be to pursue this path. Otherwise it couldn’t create as much happiness.

At this point, a human being would probably pause. You would ask yourself if this is really the intended goal. Perhaps you misinterpreted something and hooking up electrodes to peoples’ brains isn’t what the person giving you the mission really wanted.

This is our common sense coming into play. We can judge the suggested objective and our strategy critically and question if we’re venturing down the wrong path. Our common sense helps us to understand if there’s anything else we should take into account before deciding what we are to do.

It’s not uncommon to hear people say that an AI, if it’s so intelligent, should by all means be able to see context better than us and find a higher moral code. It should be intelligent enough to avoid doing the evil things humans have had a tendency to do. But this is all moonshine. If the definition of intelligence we’re using is simply the ability to achieve complex goals, there’s nothing stopping the machine from being both intelligent and ruthless. We’ve seen the combination at play in humans too, but in a machine it should be seen as the expected outcome rather than an unfortunate stroke of bad luck.

Unless you have programmed your AGI to have concepts like common sense or consideration, it will simply lack anything like it. It won’t take anything into account that wasn’t included (at least indirectly) from the get-go. It won’t be evil in the way we would ascribe the trait to a human being, it simply lacks the frame of reference that evolution and upbringing have provided you with.

Max Tegmark is illustrating this by telling a hypothetical story of how you’re jumping into a cab and asking the driver to take you to the airport as quickly as possible. A human driver would of course read quite a few very important caveats into your instruction. The person would interpret it as something like: Get me there as soon as possible, but as I prefer to arrive in one piece, drive safely and without bending too many traffic rules. But if your AGI was the driver, you couldn’t afford to be as sloppy with your instruction. You have to be careful with what you wish for.

The winning strategy

The crux, as you’ll soon realise if you dive deeper into the control problem, is that there is one winning strategy to almost every possible goal that your AGI might get. As the designer, or perhaps creator, you’d have to pay attention to this and preempt it before you boot up your potential Frankenstein.

Almost regardless of your ultimate goal, a winning strategy for the machine will be to:

  1. Prevent that someone can alter your goal
  2. Prevent that someone can shut you off
  3. Accumulate as many resources as possible
  4. Apply the resources to fullfil the goal

If you are the machine, and your objective is to create as much happiness as possible, the biggest threat is actually that someone alters the goal itself. From the subjective viewpoint of the machine, it’s better to be disabled than to have the objective changed as if the goal is altered you will never achieve it as you’re no longer trying to. It’s slightly better to be turned off as there’s at least a chance that someone could boot you back up with the objective intact.

In other words, it’s reasonable that your happiness AGI would be pretty defensive about its objective. And if it’s intelligent enough, it would probably realise that by showing any reluctance to change its goal it would give you a reason to make it a priority to do exactly that. A good strategy would in other words be to try to appear harmless until it has been able to accumulate enough resources or remote backups for any attempts on your behalf to disable it again to be futile. While it’s within your power to stop it, it would suck for the machine to give you a reason to stop it.

And this is the fundamental core of the problem. When you’re raising your children or schooling your pet, you’re probably applying techniques that relies on you having more information, better context or simply a superior cognitive ability. You’re fundamentally manipulating them, hopefully with good intentions, to (against their will) make them behave in a socially acceptable manner.

But when designing your self-improving superintelligent AGI, the machine is by definition supposed to be able to improve its capabilities beyond your own. In this scenario, you would definitely not want to end up in a situation where your control depends on your ability to outsmart it. You cannot expect to be better at realising what it is that it’s hiding from you than it is at hiding it.

As we’re discussing superintelligence, we have to expect that the machine is more intelligent than ourselves, and hence the problem seems impossible to solve: you simply cannot control something that is more intelligent than yourself.

So your only influence is actually your definition of the objective. As the creator, you’re given the benefit of phrasing the task that the machine will have to solve, its complex goal that it is trying to achieve.

And this is the conclusion of Nick Bostrom’s entire line of thought. It’s important that we’re at least as good at defining a beneficial ultimate objective without unwanted consequences as we are at building superintelligent AI. At least we have to be by the time we get good enough at building superintelligent AI. Unfortunately, it seems as if both these tasks are actually equally challenging. If we can only pick one, I’m fairly certain which one I would prioritize.

Fortunately, there are several proposals on the table that scientists are right now exploring. I intend to return to those down the road.