Read Superintelligence: Paths, Dangers, Strategies Online
Authors: Nick Bostrom
Tags: #Science, #Philosophy, #Non-Fiction
Consider
P
(
w
|
Ey
). Specifying this conditional probability is not strictly part of the value-loading problem. In order to be intelligent, the AI must already have some way of deriving reasonably accurate probabilities over many relevant factual possibilities. A system that falls too far short on this score will not pose the kind of danger that concerns us here. However, there may be a risk that the AI will end up with an epistemology that is good enough to make the AI instrumentally effective yet not good enough to enable it to think correctly about some possibilities that are of great normative importance. (The problem of specifying
P
(
w
|
Ey
) is in this way related to the problem of specifying
.) Specifying
P
(
w
|
Ey
) also requires confronting other issues, such as how to represent uncertainty over logical impossibilities.
The aforementioned issues—how to define a class of possible actions, a class of possible worlds, and a likelihood distribution connecting evidence to classes of possible worlds—are quite generic: similar issues arise for a wide range of formally specified agents. It remains to examine a set of issues more peculiar to the value learning approach; namely, how to define
,
V
(
U
), and
P
(
V
(
U
) |
w
).
is a class of utility functions. There is a connection between
and
inasmuch as each utility function
U
(
w
) in
should ideally assign utilities to each possible world
w
in
. But
also needs to be wide in the sense of containing sufficiently many and diverse utility functions for us to have justified confidence that at least one of them does a good job of representing the intended values.
The reason for writing
P
(
V
(
U
) |
w
) rather than simply
P
(
U
|
w
) is to emphasize the fact that probabilities are assigned to propositions. A utility function, per se, is not a proposition, but we can transform a utility function into a proposition by making some claim about it. For example, we may claim of a particular utility function
U
(.) that it describes the preferences of a particular person, or that it represents the prescriptions implied by some ethical theory, or that it is the utility function that the principal would have wished to have implemented if she had thought things through. The “value criterion”
V
(.) can thus be construed as a function that takes as its argument a utility function
U
and gives as its value a proposition to the effect that
U
satisfies the criterion
V
. Once we have defined a proposition
V
(
U
), we can hopefully obtain the conditional probability
P
(
V
(
U
)|
w
) from whatever source we used to obtain the other probability distributions in the AI. (If we are certain that all normatively relevant facts are taken into account in individuating the possible worlds
, then
P
(
V
(
U
)|
w
) should equal zero or one in each possible world.) The question remains how to define
V
. This is discussed further in the text.
20
. These are not the only challenges for the value learning approach. Another issue, for instance, is how to get the AI to have sufficiently sensible initial beliefs—at least by the time it becomes strong enough to subvert the programmers’ attempts to correct it.
21
. Yudkowsky (2001).
22
. The term is taken from American football, where a “Hail Mary” is a very long forward pass made in desperation, typically when the time is nearly up, on the off chance that a friendly player might catch the ball near the end zone and score a touchdown.
23
. The Hail Mary approach relies on the idea that a superintelligence could articulate its preferences with greater exactitude than we humans can articulate ours. For example, a superintelligence could specify its preference as code. So if our AI is representing other superintelligences as computational processes that are perceiving their environment, then our AI should be able to reason about how those alien superintelligences would respond to some hypothetical stimulus, such as a “window” popping up in their visual field presenting them with the source code of our own AI and asking them to specify their instructions to us in some convenient pre-specified format. Our AI could then read off these imaginary instructions (from its own model of this counterfactual scenario wherein these alien superintelligences are represented), and we would have built our AI so that it would be motivated to follow those instructions.
24
. An alternative would be to create a detector that looks (within our AI’s world model) for (representations of) physical structures created by a superintelligent civilization. We could then bypass the step of identifying the hypothesized superintelligences’ preference functions, and give our own AI the final value of trying to copy whatever physical structures it believes superintelligent civilizations tend to produce.