Read Superintelligence: Paths, Dangers, Strategies Online
Authors: Nick Bostrom
Tags: #Science, #Philosophy, #Non-Fiction
A couple of general points should be borne in mind when thinking about how the MR proposal could be refined. First, we might start conservatively, using the fallback option to cover almost all contingencies and only use the “morally right” option in those that we feel we fully understand. Second, we might add the general modulator to the MR proposal that it is to be “interpreted charitably, and revised as we would have revised it if we had thought more carefully about it before we wrote it down, etc.”
22
. Of these terms, “knowledge” might seem the one most readily susceptible to a formal analysis (in information-theoretic terms). However, to represent what it is for a human to know something, the AI may need a sophisticated set of representations relating to complex psychological properties. A human being does not “know” all the information that is stored somewhere in her brain.
23
. One indicator that the terms in CEV are (marginally) less opaque is that it would count as philosophical progress if we could analyze moral rightness in terms like those used in CEV. In fact, one of the main strands in metaethics—ideal observer theory—purports to do just that. See, e.g., Smith et al. (1989).
24
. This requires confronting the problem of fundamental normative uncertainty. It can be shown that it is not always appropriate to act according to the moral theory that has the highest probability of being true. It can also be shown that it is not always appropriate to perform the action that has the highest probability of being right. Some way of trading probabilities against “degrees of wrongness” or severity of issues at stake seems to be needed. For some ideas in this direction, see Bostrom (2009a).
25
. It could possibly even be argued that it is an adequacy condition for any explication of the notion of moral rightness that it account for how Joe Sixpack is able to have some idea of right and wrong.
26
. It is not obvious that the morally right thing
for us
to do is to build an AI that implements MR, even if we assume that
the AI itself
would always act morally. Perhaps it would be objectionably hubristic or arrogant of us to build such an AI (especially since many people may disapprove of that project). This issue can be partially finessed by tweaking the MR proposal. Suppose that we stipulate that the AI should act (to do what it would be morally right for it to do) only if it was morally right for its creators to have built the AI in the first place; otherwise it should shut itself down. It is hard to see how we would be committing any grave moral wrong in creating
that
kind of AI, since if it were wrong for us to create it, the only consequence would be that an AI was created that immediately shuts itself down, assuming that the AI has committed no mind crime up to that point. (We might nevertheless have acted wrongly—for instance, by having failed to seize the opportunity to build some other AI instead.)
A second issue is supererogation. Suppose there are many actions the AI could take, each of which would be morally right—in the sense of being
morally permissible
—yet some of which are morally better than the others. One option is to have the AI aim to select the morally best action in any such a situation (or one of the best actions, in case there are several that are equally good). Another option is to have the AI select from among the morally permissible actions one that maximally satisfies some other (non-moral) desideratum. For example, the AI could select, from among the actions that are morally permissible, the action that our CEV would prefer it to take. Such an AI, while never doing anything that is morally impermissible, might protect our interests more than an AI that does what is morally best.
27
. When the AI evaluates the moral permissibility of our act of creating the AI, it should interpret permissibility in its objective sense. In one ordinary sense of “morally permissible,” a doctor acts morally permissibly when she prescribes a drug she believes will cure her patient—even if the patient, unbeknownst to the doctor, is allergic to the drug and dies as a result. Focusing on objective moral permissibility takes advantage of the presumably superior epistemic position of the AI.
28
. More directly, it depends on the AI’s
beliefs
about which ethical theory is true (or, more precisely, on its probability distribution over ethical theories).
29
.
It can be difficult to imagine how superlatively wonderful these physically possible lives might be. See Bostrom (2008c) for a poetic attempt to convey some sense of this. See Bostrom (2008b) for an argument that some of these possibilities could be good
for us
, good for existing human beings.
30
. It might seem deceptive or manipulative to promote one proposal if one thinks that some other proposal would be better. But one could promote it in ways that avoid insincerity. For example, one could freely acknowledge the superiority of the ideal while still promoting the non-ideal as the best attainable compromise.
31
. Or some other positively evaluative term, such as “good,” “great,” or “wonderful.”
32
. This echoes a principle in software design known as “Do What I Mean,” or DWIM. See Teitelman (1966).
33
. Goal content, decision theory, and epistemology are three aspects that should be elucidated; but we do not intend to beg the question of whether there must be a neat decomposition into these three separate components.
34
. An ethical project ought presumably to allocate at most a modest portion of the eventual benefits that the superintelligence produces as special rewards to those who contributed in morally permissible ways to the project’s success. Allocating a great portion to the incentive wrapping scheme would be unseemly. It would be analogous to a charity that spends 90% of its income on performance bonuses for its fundraisers and on advertising campaigns to increase donations.
35
. How could the dead be rewarded? One can think of several possibilities. At the low end, there could be memorial services and monuments, which would be a reward insofar as people desired posthumous fame. The deceased might also have other preferences about the future that could be honored, for instance concerning cultures, arts, buildings, or natural environments. Furthermore, most people care about their descendants, and special privileges could be granted to the children and grandchildren of contributors. More speculatively, the superintelligence might be able to create relatively faithful simulations of some past people—simulations that would be conscious and that would resemble the original sufficiently to count as a form of survival (according to at least some people’s criteria). This would presumably be easier for people who have been placed in cryonic suspension; but perhaps for a superintelligence it would not be impossible to recreate something quite similar to the original person from other preserved records such as correspondence, publications, audiovisual materials and digital records, or the personal memories of other survivors. A superintelligence might also think of some possibilities that do not readily occur to us.
36
. On Pascalian mugging, see Bostrom (2009b). For an analysis of issues related to infinite utilities, see Bostrom (2011a). On fundamental normative uncertainty, see, e.g., Bostrom (2009a).
37
. E.g., Price (1991); Joyce (1999); Drescher (2006); Yudkowsky (2010); Dai (2009).
38
. E.g., Bostrom (2009a).
39
. It is also conceivable that using indirect normativity to specify the AI’s goal content would mitigate the problems that might arise from an incorrectly specified decision theory. Consider, for example, the CEV approach. If it were implemented well, it might be able to compensate for at least some errors in the specification of the AI’s decision theory. The implementation could allow the values that our coherent extrapolated volition would want the AI to pursue to depend on the AI’s decision theory. If our idealized selves knew they were making value specifications for an AI that was using a particular kind of decision theory, they could adjust their value specifications such as to make the AI behave benignly despite its warped decision theory—much like one can cancel out the distorting effects of one lens by placing another lens in front of it that distorts oppositely.
40
. Some epistemological systems may, in a holistic manner, have no distinct foundation. In that case, the constitutional inheritance is not a distinct set of principles, but rather, as it were, an epistemic starting point that embodies certain propensities to respond to incoming streams of evidence.
41
. See, e.g., the problem of distortion discussed in Bostrom (2011a).
42
. For instance, one disputed issue in anthropic reasoning is whether the so-called self-indication assumption should be accepted. The self-indication assumption states, roughly, that from the fact that you exist you should infer that hypotheses according to which larger numbers
N
of observers exist should receive a probability boost proportional to
N
. For an argument against
this principle, see the “Presumptuous Philosopher” gedanken experiment in Bostrom (2002a). For a defense of the principle, see Olum (2002); and for a critique of that defense, see Bostrom and Ćirković (2003). Beliefs about the self-indication assumption might affect various empirical hypotheses of potentially crucial strategic relevance, for example, considerations such as the Carter–Leslie doomsday argument, the simulation argument, and “great filter” arguments. See Bostrom (2002a, 2003a, 2008a); Carter (1983); Ćirković et al. (2010); Hanson (1998d); Leslie (1996); Tegmark and Bostrom (2005). A similar point could be made with regard to other fraught issues in observation selection theory, such as whether the choice of reference class can be relativized to observer-moments, and if so how.
43
. See, e.g., Howson and Urbach (1993). There are also some interesting results that narrow the range of situations in which two Bayesian agents can rationally disagree when their opinions are common knowledge; see Aumann (1976) and Hanson (2006).
44
. Cf. the concept of a “last judge” in Yudkowsky (2004).
CHAPTER 14: THE STRATEGIC PICTURE45
. There are many important issues outstanding in epistemology, some mentioned earlier in the text. The point here is that we may not need to get all the solutions exactly right in order to achieve an outcome that is practically indiscernible from the best outcome. A mixture model (which throws together a wide range of diverse priors) might work.
1
. This principle is introduced in Bostrom (2009b, 190), where it is also noted that it is not tautological. For a visual analogy, picture a box with large but finite volume, representing the space of basic capabilities that could be obtained through some possible technology. Imagine sand being poured into this box, representing research effort. How you pour the sand determines where it piles up in the box. But if you keep on pouring, the entire space eventually gets filled.
2
. Bostrom (2002b).
3
. This is not the perspective from which science policy has traditionally been viewed. Harvey Averch describes science and technology policy in the United States between 1945 and 1984 as having been centered on debates about the optimum level of public investment in the S&T enterprise and on the extent to which the government should attempt to “pick winners” in order to achieve the greatest increase in the nation’s economic prosperity and military strength. In these calculations, technological progress is always assumed to be good. But Averch also describes the rise of critical perspectives, which question the “progress is always good” premiss (Averch 1985). See also Graham (1997).
4
. Bostrom (2002b).
5
. This is of course by no means tautological. One could imagine a case being made for a different order of development. It could be argued that it would be better for humanity to confront some less difficult challenge first, say the development of nanotechnology, on grounds that this would force us to develop better institutions, become more internationally coordinated, and mature in our thinking about global strategy. Perhaps we would be more likely to rise to a challenge that presents a less metaphysically confusing threat than machine superintelligence. Nanotechnology (or synthetic biology, or whatever the lesser challenge we confront first) might then serve as a footstool that would help us ascend to the capability level required to deal with the higher-level challenge of superintelligence.
Such an argument would have to be assessed on a case-by-case basis. For example, in the case of nanotechnology, one would have to consider various possible consequences such as the boost in hardware performance from nanofabricated computational substrates; the effects of cheap physical capital for manufacturing on economic growth; the proliferation of sophisticated surveillance technology; the possibility that a singleton might emerge though the direct or indirect effects of a nanotechnology breakthrough; and the greater feasibility of neuromorphic and whole brain emulation approaches to machine intelligence. It is beyond the scope of our investigation to consider all these issues (or the parallel issues that might arise for other existential risk-causing technologies). Here we just point out the prima facie case for favoring a superintelligence-first sequence of development—while stressing that there are complications that might alter this preliminary assessment in some cases.