Superintelligence: Paths, Dangers, Strategies (40 page)

Read Superintelligence: Paths, Dangers, Strategies Online

Authors: Nick Bostrom

Tags: #Science, #Philosophy, #Non-Fiction

BOOK: Superintelligence: Paths, Dangers, Strategies

8.7Mb size Format: txt, pdf, ePub

Rationales for CEV

Yudkowsky’s article offered seven arguments for the CEV approach. Three of these were basically different ways of making the point that while the aim should
be to do something that is humane and helpful, it would be very difficult to lay down an explicit set of rules that does not have unintended interpretations and undesirable consequences.
¹⁴The CEV approach is meant to be robust and self-correcting; it is meant to capture the
source
of our values instead of relying on us correctly enumerating and articulating, once and for all, each of our essential values.

The remaining four arguments go beyond that first basic (but important) point, spelling out desiderata on candidate solutions to the value-specification problem and suggesting that CEV meets these desiderata.

“Encapsulate moral growth”

This is the desideratum that the solution should allow for the possibility of moral progress. As suggested earlier, there are reasons to believe that our current moral beliefs are flawed in many ways; perhaps deeply flawed. If we were to stipulate a specific and unalterable moral code for the AI to follow, we would in effect be locking in our present moral convictions, including their errors, destroying any hope of moral growth. The CEV approach, by contrast, allows for the possibility of such growth because it has the AI try to do that which we would have wished it to do if we had developed further under favorable conditions, and it is possible that if we had thus developed our moral beliefs and sensibilities would have been purged of their current defects and limitations.

“Avoid hijacking the destiny of humankind”

Yudkowsky has in mind a scenario in which a small group of programmers creates a seed AI that then grows into a superintelligence that obtains a decisive strategic advantage. In this scenario, the original programmers hold in their hands the entirety of humanity’s cosmic endowment. Obviously, this is a hideous responsibility for any mortal to shoulder. Yet it is not possible for the programmers to completely shirk the onus once they find themselves in this situation: any choice they make, including abandoning the project, would have world-historical consequences. Yudkowsky sees CEV as a way for the programmers to avoid arrogating to themselves the privilege or burden of determining humanity’s future. By setting up a dynamic that implements
humanity’s
coherent extrapolated volition—as opposed to their own volition, or their own favorite moral theory—they in effect distribute their influence over the future to all of humanity.

“Avoid creating a motive for modern-day humans to fight over the initial dynamic”

Distributing influence over humanity’s future is not only morally preferable to the programming team implementing their own favorite vision, it is also a way to reduce the incentive to fight over who gets to create the first superintelligence. In the CEV approach, the programmers (or their sponsors) exert no more influence over the content of the outcome than any other person—though they of course play a starring causal role in determining the structure of the extrapolation and in deciding to implement humanity’s CEV instead of some alternative. Avoiding
conflict is important not only because of the immediate harm that conflict tends to cause but also because it hinders collaboration on the difficult challenge of developing superintelligence safely and beneficially.

CEV is meant to be capable of commanding wide support. This is not just because it allocates influence equitably. There is also a deeper ground for the irenic potential of CEV, namely that it enables many different groups to hope that their preferred vision of the future will prevail totally. Imagine a member of the Afghan Taliban debating with a member of the Swedish Humanist Association. The two have very different worldviews, and what is a utopia for one might be a dystopia for the other. Nor might either be thrilled by any compromise position, such as permitting girls to receive an education but only up to ninth grade, or permitting Swedish girls to be educated but Afghan girls not. However, both the Taliban and the Humanist might be able to endorse the principle that the future should be determined by humanity’s CEV. The Taliban could reason that if his religious views are in fact correct (as he is convinced they are) and if good grounds for accepting these views exist (as he is also convinced) then humankind would in the end come to accept these views if only people were less prejudiced and biased, if they spent more time studying scripture, if they could more clearly understand how the world works and recognize essential priorities, if they could be freed from irrational rebelliousness and cowardice, and so forth.
¹⁵The Humanist, similarly, would believe that under these idealized conditions, humankind would come to embrace the principles she espouses.

“Keep humankind ultimately in charge of its own destiny”

We might not want an outcome in which a paternalistic superintelligence watches over us constantly, micromanaging our affairs with an eye towards optimizing every detail in accordance with a grand plan. Even if we stipulate that the superintelligence would be perfectly benevolent, and free from presumptuousness, arrogance, overbearingness, narrow-mindedness, and other human shortcomings, one might still resent the loss of autonomy entailed by such an arrangement. We might prefer to create our destiny as we go along, even if it means that we sometimes fumble. Perhaps we want the superintelligence to serve as a safety net, to support us when things go catastrophically wrong, but otherwise to leave us to fend for ourselves.

CEV allows for this possibility. CEV is meant to be an “initial dynamic,” a process that runs once and then replaces itself with whatever the extrapolated volition wishes. If humanity’s extrapolated volition wishes that we live under the supervision of a paternalistic AI, then the CEV dynamic would create such an AI and hand it the reins. If humanity’s extrapolated volition instead wishes that a democratic human world government be created, then the CEV dynamic might facilitate the establishment of such an institution and otherwise remain invisible. If humanity’s extrapolated volition is instead that each person should get an endowment of resources that she can use as she pleases so long as she respects the equal rights of others, then the CEV dynamic could make this come true by
operating in the background much like a law of nature, to prevent trespass, theft, assault, and other nonconsensual impingements.
¹⁶

The structure of the CEV approach thus allows for a virtually unlimited range of outcomes. It is also conceivable that humanity’s extrapolated volition would wish that the CEV does nothing at all. In that case, the AI implementing CEV should, upon having established with sufficient probability that this is what humanity’s extrapolated volition would wish it to do, safely shut itself down.

Further remarks

The CEV proposal, as outlined above, is of course the merest schematic. It has a number of free parameters that could be specified in various ways, yielding different versions of the proposal.

One parameter is the extrapolation base: Whose volitions are to be included? We might say “everybody,” but this answer spawns a host of further questions. Does the extrapolation base include so-called “marginal persons” such as embryos, fetuses, brain-dead persons, patients with severe dementias or who are in permanent vegetative states? Does each of the hemispheres of a “split-brain” patient get its own weight in the extrapolation and is this weight the same as that of the entire brain of a normal subject? What about people who lived in the past but are now dead? People who will be born in the future? Higher animals and other sentient creatures? Digital minds? Extraterrestrials?

One option would be to include only the population of adult human beings on Earth who are alive at the start of the time of the AI’s creation. An initial extrapolation from this base could then decide whether and how the base should be expanded. Since the number of “marginals” at the periphery of this base is relatively small, the result of the extrapolation may not depend much on exactly where the boundary is drawn—on whether, for instance, it includes fetuses or not.

That somebody is excluded from the original extrapolation base does not imply that their wishes and well-being are disregarded. If the coherent extrapolated volition of those in the extrapolation base (e.g. living adult human beings) wishes that moral consideration be extended to other beings, then the outcome of the CEV dynamic would reflect that preference. Nevertheless, it is possible that the interests of those who are included in the original extrapolation base would be accommodated to a greater degree than the interests of outsiders. In particular, if the dynamic acts only where there is broad agreement between individual extrapolated volitions (as in Yudkowsky’s original proposal), there would seem to be a significant risk of an ungenerous blocking vote that could prevent, for instance, the welfare of nonhuman animals or digital minds from being protected. The result might potentially be morally rotten.
¹⁷

One motivation for the CEV proposal was to avoid creating a motive for humans to fight over the creation of the first superintelligent AI. Although the CEV proposal scores better on this desideratum than many alternatives, it does not entirely eliminate motives for conflict. A selfish individual, group, or nation
might seek to enlarge its slice of the future by keeping others out of the extrapolation base.

A power grab of this sort might be rationalized in various ways. It might be argued, for instance, that the sponsor who funds the development of the AI deserves to own the outcome. This moral claim is probably false. It could be objected, for example, that the project that launches the first successful seed AI imposes a vast risk externality on the rest of humanity, which therefore is entitled to compensation. The amount of compensation owed is so great that it can only take the form of giving everybody a stake in the upside if things turn out well.
¹⁸

Another argument that might be used to rationalize a power grab is that large segments of humanity have base or evil preferences and that including them in the extrapolation base would risk turning humanity’s future into a dystopia. It is difficult to know the share of good and bad in the average person’s heart. It is also difficult to know how much this balance varies between different groups, social strata, cultures, or nations. Whether one is optimistic or pessimistic about human nature, one may prefer not to wager humanity’s cosmic endowment on the speculation that, for a sufficient majority of the seven billion people currently alive, their better angels would prevail in their extrapolated volitions. Of course, omitting a certain set of people from the extrapolation base does not guarantee that light would triumph; and it might well be that the souls that would soonest exclude others or grab power for themselves tend rather to contain unusually large amounts of darkness.

Yet another reason for fighting over the initial dynamic is that one might believe that somebody else’s AI will not work as advertised, even if the AI is billed as a way to implement humanity’s CEV. If different groups have different beliefs about which implementation is most likely to succeed, they might fight to prevent the others from launching. It would be better in such situations if the competing projects could settle their epistemic differences by some method that more reliably ascertains who is right than the method of armed conflict.
¹⁹

Morality models

The CEV proposal is not the only possible form of indirect normativity. For example, instead of implementing humanity’s coherent extrapolated volition, one could try to build an AI with the goal of doing what is morally right, relying on the AI’s superior cognitive capacities to figure out just which actions fit that description. We can call this proposal “moral rightness” (MR). The idea is that we humans have an imperfect understanding of what is right and wrong, and perhaps an even poorer understanding of how the concept of moral rightness is to be philosophically analyzed: but a superintelligence could understand these things better.
²⁰

What if we are not sure whether moral realism is true? We could still attempt the MR proposal. We should just have to make sure to specify what the AI should do in the eventuality that its presupposition of moral realism is false. For example,
we could stipulate that if the AI estimates with a sufficient probability that there are no suitable non-relative truths about moral rightness, then it should revert to implementing coherent extrapolated volition instead, or simply shut itself down.
²¹

MR appears to have several advantages over CEV. MR would do away with various free parameters in CEV, such as the degree of coherence among extrapolated volitions that is required for the AI to act on the result, the ease with which a majority can overrule dissenting minorities, and the nature of the social environment within which our extrapolated selves are to be supposed to have “grown up farther together.” It would seem to eliminate the possibility of a moral failure resulting from the use of an extrapolation base that is too narrow or too wide. Furthermore, MR would orient the AI toward morally right action even if our coherent extrapolated volitions happen to wish for the AI to take actions that are morally odious. As noted earlier, this seems a live possibility with the CEV proposal. Moral goodness might be more like a precious metal than an abundant element in human nature, and even after the ore has been processed and refined in accordance with the prescriptions of the CEV proposal, who knows whether the principal outcome will be shining virtue, indifferent slag, or toxic sludge?

MR would also appear to have some disadvantages. It relies on the notion of “morally right,” a notoriously difficult concept, one with which philosophers have grappled since antiquity without yet attaining consensus as to its analysis. Picking an erroneous explication of “moral rightness” could result in outcomes that would be morally very wrong. This difficulty of defining “moral rightness” might seem to count heavily against the MR proposal. However, it is not clear that the MR proposal is really at a material disadvantage in this regard. The CEV proposal, too, uses terms and concepts that are difficult to explicate (such as “knowledge,” “being more the people we wished we were,” “grown up farther together,” among others).
²²Even if these concepts are marginally less opaque than “moral rightness,” they are still miles removed from anything that programmers can currently express in code.
²³The path to endowing an AI with any of these concepts might involve giving it general linguistic ability (comparable, at least, to that of a normal human adult). Such a general ability to understand natural language could then be used to understand what is meant by “morally right.” If the AI could grasp the meaning, it could search for actions that fit. As the AI develops superintelligence, it could then make progress on two fronts: on the philosophical problem of understanding what moral rightness is, and on the practical problem of applying this understanding to evaluate particular actions.
²⁴While this would not be easy, it is not clear that it would be any
more
difficult than extrapolating humanity’s coherent extrapolated volition.
²⁵

Other books

Rockstar Daddy (Decoy Series) by Fisher, K.T

Flesheaters and Bloodsuckers Anonymous: A Dark Humor by HC Hammond

Forbidden Love by Shirley Martin

The Wickedest Lord Alive by Christina Brooke

A Dangerous Disguise by Barbara Cartland

Any Man I Want by Michele Grant

HCC 115 - Borderline by Lawrence Block

Night by Edna O'Brien

The OK Team 2 by Nick Place

Dancing Under the Red Star by Karl Tobien