Read Superintelligence: Paths, Dangers, Strategies Online
Authors: Nick Bostrom
Tags: #Science, #Philosophy, #Non-Fiction
6
. Bostrom (1997).
7
. There might be some possible implementations of a reinforcement learning mechanism that would, when the AI discovers the wireheading solution, lead to a safe incapacitation rather than to infrastructure profusion. The point is that this could easily go wrong and fail for unexpected reasons.
8
. This was suggested by Marvin Minsky (
vide
Russell and Norvig [2010, 1039]).
9
. The issue of which kinds of digital mind would be conscious, in the sense of having subjective phenomenal experience, or “qualia” in philosopher-speak, is important in relation to this point (though it is irrelevant to many other parts of this book). One open question is how hard it would be to accurately estimate how a human-like being would behave in various circumstances without simulating its brain in enough detail that the simulation is conscious. Another question is whether there are generally useful algorithms for a superintelligence, for instance reinforcement-learning techniques, such that the implementation of these algorithms would generate qualia. Even if we judge the probability that any such subroutines would be conscious to be fairly small, the number of instantiations might be so large that even a small risk that they might experience suffering ought to be accorded significant weight in our moral calculation. See also Metzinger (2003, Chap. 8).
CHAPTER 9: THE CONTROL PROBLEM10
. Bostrom (2002a, 2003a); Elga (2004).
1
. E.g., Laffont and Martimort (2002).
2
. Suppose a majority of voters want their country to build some particular kind of superintelligence. They elect a candidate who promises to do their bidding, but they might find it difficult to ensure that the candidate, once in power, will follow through on her campaign promise and pursue the project in the way that the voters intended. Supposing she is true to her word, she instructs her government to contract with an academic or industry consortium to carry out the work; but again there are agency problems: the bureaucrats in the government department might have their own views about what should be done and may implement the project in a way that respects the letter but not the spirit of the leader’s instructions. Even if the government department does its job faithfully, the contracted scientific partners might have their own separate agendas. The problem recurs on many levels. The director of one of the participating laboratories might lie awake worrying about a technician introducing an unsanctioned element into the design—imagining Dr. T. R. Eason sneaking into his office late one night, logging into the project code base, rewriting a part of the seed AI’s goal system. Where it was supposed to say “serve humanity,” it now says “serve Dr. T. R. Eason.”
3
.
Even for superintelligence development, though, there could be a role for behavioral testing—as one auxiliary element within a wider battery of safety measures. Should an AI misbehave in its developmental phase, something is clearly awry—though, importantly, the converse does not hold.
4
. In a classic exploit from 1975, Steven Dompier wrote a program for the Altair 8800 that took advantage of this effect (and the absence of shielding around the microcomputer’s case). Running the program caused the emission of electromagnetic waves that would produce music when one held a transistor radio close to the computer (Driscoll 2012). The young Bill Gates, who attended a demo, reported that he was impressed and mystified by the hack (Gates 1975). There are in any case plans to design future chips with built-in Wi-Fi capabilities (Greene 2012).
5
. It is no light matter to have held a conviction, which, had we had an opportunity to act upon it, could have resulted in the ruination of all our cosmic endowment. Perhaps one could argue for the following principle: if somebody has in the past been certain on
N
occasions that a system has been improved sufficiently to make it safe, and each time it was revealed that they were wrong, then on the next occasion they are not entitled to assign a credence greater than 1/(
N
+ 1) to the system being safe.
6
. In one informal experiment, the role of the AI was played by an intelligent human. Another individual played the role of gatekeeper and was tasked with not letting the AI out of the box. The AI could communicate with the gatekeeper only by text and was given two hours to persuade the gatekeeper to let it out. In three cases out of five, with different individuals playing the gatekeeper, the AI escaped (Yudkowsky 2002). What a human can do, a superintelligence can do too. (The reverse, of course, does not hold. Even if the task for a real superintelligence were harder—maybe the gatekeepers would be more strongly motivated to refrain from releasing the AI than the individuals playing gatekeeper in the experiment—the superintelligence might still succeed where a human would fail.)
7
. One should not overstate the marginal amount of safety that could be gained in this way. Mental imagery can substitute for graphical display. Consider the impact books can have on people—and books are not even interactive.
8
. See also Chalmers (2010). It would be a mistake to infer from this that there is
no
possible use in building a system that will never be observed by any outside entity. One might place a final value on what goes on inside such a system. Also, other people might have preferences about what goes on inside such a system, and might therefore be influenced by its creation or the promise of its creation. Knowledge of the existence of certain kinds of isolated systems (ones containing observers) can also induce anthropic uncertainty in outside observers, which may influence their behavior.
9
. One might wonder why social integration is considered a form of capability control. Should it not instead be classified as a motivation selection method on the ground that it involves seeking to influence a system’s behavior by means of incentives? We will look closely at motivation selection presently; but, in answer to this question, we are construing motivation selection as a cluster of control methods that work by selecting or shaping a system’s final goals—goals sought for their own sakes rather than for instrumental reasons. Social integration does not target a system’s final goals, so it is not motivation selection. Rather, social integration aims to limit the system’s effective capabilities: it seeks to render the system incapable of achieving a certain set of outcomes—outcomes in which the system attains the benefits of defection without suffering the associated penalties (retribution, and loss of the gains from collaboration). The hope is that by limiting which outcomes the system is able to attain, the system will find that the most effective remaining means of realizing its final goals is to behave cooperatively.
10
. This approach may be somewhat more promising in the case of an emulation believed to have anthropomorphic motivations.
11
. I owe this idea to Carl Shulman.
12
. Creating a cipher certain to withstand a superintelligent code-breaker is a nontrivial challenge. For example, traces of random numbers might be left in some observer’s brain or in the microstructure of the random generator, from whence the superintelligence can retrieve them; or, if pseudorandom numbers are used, the superintelligence might guess or discover the seed from
which they were generated. Further, the superintelligence could build large quantum computers, or even discover unknown physical phenomena that could be used to construct new kinds of computers.
13
. The AI could wire itself to
believe
that it had received a reward tokens, but this should not make it wirehead if it is designed to want the reward tokens (as opposed to wanting to be in a state in which it has certain beliefs about the reward tokens).
14
. For the original article, see Bostrom (2003a). See also Elga (2004).
15
. Shulman (2010a).
16
. Basement-level reality presumably contains more computational resources than simulated reality, since any computational processes occurring in a simulation are also occurring on the computer running the simulation. Basement-level reality might also contain a wealth of other physical resources which could be hard for simulated agents to access—agents that exist only at the indulgence of powerful simulators who may have other uses in mind for those resources. (Of course, the inference here is not strictly deductively valid: in principle, it could be the case that universes in which simulations are run contain so much more resources that simulated civilizations on average have access to more resources than non-simulated civilizations, even though each non-simulated civilization that runs simulations has more resources than all the civilizations it simulates do combined.)
17
. There are various further esoteric considerations that might bear on this matter, the implications of which have not yet been fully analyzed. These considerations may ultimately be crucially important in developing an all-things-considered approach to dealing with the prospect of an intelligence explosion. However, it seems unlikely that we will succeed in figuring out the practical import of such esoteric arguments unless we have first made some progress on the more mundane kinds of consideration that are the topic of most of this book.
18
. Cf., e.g., Quine and Ullian (1978).
19
. Which an AI might investigate by considering the performance characteristics of various basic computational functionalities, such as the size and capacity of various data buses, the time it takes to access different parts of memory, the incidence of random bit flips, and so forth.
20
. Perhaps the prior could be (a computable approximation of) the Solomonoff prior, which assigns probability to possible worlds on the basis of their algorithmic complexity. See Li and Vitányi (2008).
21
. The moment
after
the conception of deception, the AI might contrive to erase the trace of its mutinous thought. It is therefore important that this tripwire operate continuously. It would also be good practice to use a “flight recorder” that stores a complete trace of all the AI’s activity (including exact timing of keyboard input from the programmers), so that its trajectory can be retraced or analyzed following an automatic shutdown. The information could be stored on a write-once-read-many medium.
22
. Asimov (1942). To the three laws were later added a “Zeroth Law”: “(0) A robot may not harm humanity, or, by inaction, allow humanity to come to harm” (Asimov 1985).
23
. Cf. Gunn (1982).
24
. Russell (1986, 161f).
25
. Similarly, although some philosophers have spent entire careers trying to carefully formulate deontological systems, new cases and consequences occasionally come to light that necessitate revisions. For example, deontological moral philosophy has in recent years been reinvigorated through the discovery of a fertile new class of philosophical thought experiments, “trolley problems,” which reveal many subtle interactions among our intuitions about the moral significance of the acts/omissions distinction, the distinction between intended and unintended consequences, and other such matters; see, e.g., Kamm (2007).
26
. Armstrong (2010).
27
. As a rule of thumb, if one plans to use multiple safety mechanisms to contain an AI, it may be wise to work on each one
as if
it were intended to be the sole safety mechanism and
as if
it were therefore required to be individually sufficient. If one puts a leaky bucket inside another leaky bucket, the water still comes out.
CHAPTER 10: ORACLES, GENIES, SOVEREIGNS, TOOLS28
. A variation of the same idea is to build the AI so that it is continuously motivated to act on its best guesses about what the implicitly defined standard is. In this setup, the AI’s final goal is
always to act on the implicitly defined standard, and it pursues an investigation into what this standard is only for instrumental reasons.
1
. These names are, of course, anthropomorphic and should not be taken seriously as analogies. They are just meant as labels for some
prima facie
different concepts of possible system types that one might consider trying to build.
2
. In response to a question about the outcome of the next election, one would not wish to be served with a comprehensive list of the projected position and momentum vectors of nearby particles.
3
. Indexed to a particular instruction set on a particular machine.
4
. Kuhn (1962); de Blanc (2011).
5
. It would be harder to apply such a “consensus method” to genies or sovereigns, because there may often be numerous sequences of basic actions (such as sending particular patterns of electrical signals to the system’s actuators) that would be almost exactly equally effective at achieving a given objective; whence slightly different agents may legitimately choose slightly different actions, resulting in a failure to reach consensus. By contrast, with appropriately formulated questions, there would usually be a small number of suitable answer options (such as “yes” and “no”). (On the concept of a Schelling point, also referred to as a “focal point,” see Schelling [1980].)