Read Superintelligence: Paths, Dangers, Strategies Online
Authors: Nick Bostrom
Tags: #Science, #Philosophy, #Non-Fiction
The value-loading problem looks somewhat different for whole brain emulation than it does for artificial intelligence. Methods that presuppose a fine-grained understanding and control of algorithms and architecture are not applicable to emulations. On the other hand, the augmentation motivation selection method—inapplicable to de novo artificial intelligence—is available to be used with emulations (or enhanced biological brains).
28
The augmentation method could be combined with techniques to tweak the inherited goals of the system. For example, one could try to manipulate the motivational state of an emulation by administering the digital equivalent of psychoactive substances (or, in the case of biological systems, the actual chemicals). Even now it is possible to pharmacologically manipulate values and motivations to a limited extent.
29
The pharmacopeia of the future may contain drugs with more specific and predictable effects. The digital medium of emulations should greatly facilitate such developments, by making controlled experimentation easier and by rendering all cerebral parts directly addressable.
Just as when biological test subjects are used, research on emulations would get entangled in ethical complications, not all of which could be brushed aside with a consent form. Such entanglements could slow progress along the emulation path (because of regulation or moral restraint), perhaps especially hindering studies on how to manipulate the motivational structure of emulations. The result could be that emulations are augmented to potentially dangerous superintelligent levels of cognitive ability before adequate work has been done to test or adjust their final goals. Another possible effect of the moral entanglements might be to give the lead to less scrupulous teams and nations. Conversely, were we to relax our moral standards for experimenting with digital human minds, we could become responsible for a substantial amount of harm and wrongdoing, which is obviously undesirable. Other things equal, these considerations favor taking some alternative path that does not require the extensive use of digital human research subjects in a strategically high-stakes situation.
The issue, however, is not clear-cut. One could argue that whole brain emulation research is
less
likely to involve moral violations than artificial intelligence research, on the grounds that we are more likely to recognize when an emulation mind qualifies for moral status than we are to recognize when a completely alien or synthetic mind does so. If certain kinds of AIs, or their subprocesses, have a significant moral status that we fail to recognize, the consequent moral violations could be extensive. Consider, for example, the happy abandon with which contemporary programmers create reinforcement-learning agents and subject them to aversive stimuli. Countless such agents are created daily, not only in computer science laboratories but in many applications, including some computer games containing sophisticated non-player characters. Presumably, these agents are still too primitive to have any moral status. But how confident can we really be that this is so? More importantly, how confident can we be that we will know to stop in time, before our programs become capable of experiencing morally relevant suffering?
(We will return in
Chapter 14
to some of the broader strategic questions that arise when we compare the desirability of emulation and artificial intelligence paths.)
Some intelligent systems consist of intelligent parts that are themselves capable of agency. Firms and states exemplify this in the human world: whilst largely composed of humans they can, for some purposes, be viewed as autonomous agents in their own right. The motivations of such composite systems depend not only on the motivations of their constituent subagents but also on how those subagents are organized. For instance, a group that is organized under strong dictatorship might behave as if it had a will that was identical to the will of the subagent that occupies the dictator role, whereas a democratic group might sometimes behave more as if it had a will that was a composite or average of the wills of its various constituents. But one can also imagine governance institutions that would make an organization behave in a way that is not a simple function of the wills of its subagents. (Theoretically, at least, there could exist a totalitarian state that
everybody
hated, because the state had mechanisms to prevent its citizens from coordinating a revolt. Each citizen could be worse off by revolting alone than by playing their part in the state machinery.)
By designing appropriate institutions for a composite system, one could thus try to shape its effective motivation. In
Chapter 9
, we discussed social integration as a possible capability control method. But there we focused on the incentives faced by an agent as a consequence of its existence in a social world of near-equals. Here we are focusing on what happens
inside
a given agent: how its will is determined by its internal organization. We are therefore looking at
a motivation selection method. Moreover, since this kind of internal institution design does not depend on large-scale social engineering or reform, it is a method that might be available to an individual project developing superintelligence even if the wider socioeconomic or international milieu is less than ideally favorable.
Institution design is perhaps most plausible in contexts where it would be combined with augmentation. If we could start with agents that are already suitably motivated or that have human-like motivations, institutional arrangements could be used as an extra safeguard to increase the chances that the system will stay on course.
For example, suppose that we start with some well-motivated human-like agents—let us say emulations. We want to boost the cognitive capacities of these agents, but we worry that the enhancements might corrupt their motivations. One way to deal with this challenge would be to set up a system in which individual emulations function as subagents. When a new enhancement is introduced, it is first applied to a small subset of the subagents. Its effects are then studied by a review panel composed of subagents who have not yet had the enhancement applied to them. Only when these peers have satisfied themselves that the enhancement is not corrupting is it rolled out to the wider subagent population. If the enhanced subagents are found to be corrupted, they are not given further enhancements and are excluded from key decision-making functions (at least until the system as a whole has advanced to a point where the corrupted subagents can be safely reintegrated).
30
Although the corrupted subagents might have gained some advantage from the enhancement, the institutional structure in which they are embedded, and the fact that they constitute a small minority of all subagents, would hopefully prevent them from seizing power or spreading their corruption to the overall system. Thus, the collective intelligence and capability of the system could be gradually enhanced in a sequence of small steps, where the soundness of each step is verified by subagents only slightly less capable than the new subagents that are introduced in that step.
The amount of safety obtainable by this kind of institution design is limited by the accuracy of the tests that are used to evaluate the enhanced subagents. The problem of test construction is complicated by the fact that the test batteries could only be validated within the pre-existing range of cognitive ability yet must be applied to agents outside this range. One could mitigate this problem by taking small enhancement steps and by letting the test run for a long time. Such caution, however, would raise the cost and slow progress (which, if a race dynamic is occurring, could mean a project employing these safety measures would place itself at a disadvantage). Even with small steps and extensive testing, there would remain some risk of subtle flaws being introduced, leading to a gradual corruption of the goal system, a corruption that might become manifest only when the system reaches superintelligence. This risk could be reduced by
letting the testing condition extend for many steps of cognitive enhancement, so that even agents that have been initially certified as sound and that have received several subsequent enhancements would continue to be under review by a panel of unmodified and uncorrupted original peers and would remain subject to reversion.
Ideally, the testing should be ongoing rather than confined to an initial evaluation period. Human-level subagents have the ability to strategize and might thus choose to conceal certain goals while their behavior was being monitored. To enable continuous testing, a hierarchy might be created in which subagents at a certain level of capability are tasked with monitoring subagents at a somewhat higher level of capability. At the bottom of the capability hierarchy (but at the top of the power hierarchy) would sit the relatively dumb and slow principal. This human principal would be like a demented king who reigns over an incompetent court that oversees a mediocre administration which governs a capable people. Alternatively, he might be likened to the evolutionarily ancient “reptilian brain” which ensures that the newer and much cleverer neocortex is employed in the service of goals such as feeding and copulation.
One might question whether such an inverse meritocracy could remain stable. If the bulk of the superintelligent subagents were found to be corrupt, and the humans in charge of the whole thing attempted to roll back many of the preceding enhancements, would the institutional structure be strong enough to withstand the opposition that such a move would engender? In the context of historical or contemporary human societies, the answer perhaps is that this would be implausible. (Not that the most capable people consistently rise to power, but we are not
literally
ruled by chimpanzees either.) The issue is less clear, however, in a context where society is implemented as software.
31
Imagine a digital hierarchy that has one highly trained police agent for every two proletarians. It also has one special-police agent for every two police agents. It also has one special-special-police agent for every two special-police agents—and so forth, so that each layer of overseers has at least half the numbers of the layer it oversees. Imagine, further, that this society is architected in such a way that a supervisor has big advantages over his subordinates. For example, the supervisor is able to monitor and record everything his subordinates say and do, whereas the supervisor himself is hidden behind a one-way glass, so that his subordinates can only hear the orders and instructions that the superior chooses to transmit. Imagine that supervisors have a panel of buttons that they can press at any time to cause a subordinate to be punished or rewarded, paused, deleted, or reverted to an earlier state. These capabilities would greatly boost the powers of a supervising agent, beyond what has been possible historically in even the most totalitarian of organizations.
Even this does not exhaust the possible instruments of control. Supervisors might also have at their disposal realistic virtual reality simulations that they can use to probe subordinates’ reactions to hypothetical scenarios (such as simulated offers to join a conspiracy or to slack when they think nobody is watching). Supervisors
might, additionally, have the ability to eavesdrop on internal monologues in the subordinates’ minds and to directly manipulate their emotional states.
The upshot is a series of massive advantages for the supervisor.
32
Each supervisor would be a subordinate to another, higher-level supervisor, who would monitor
his
every move and make sure
he
did not slacken in the execution of his managerial duties. With this kind of arrangement, it might be possible for the structure to be stable even if each supervisory level is intellectually somewhat weaker than the level it supervises.
33
And the entire structure, which could include many highly superintelligent agents, may be controlled by a small number of humans with root access.
34
The cost of such an internal control structure would be significant but not necessarily prohibitive. For example, in the model where each level of supervision has half the numbers of the layer below, the extra computational overhead is bounded at a mere 100% of what the proletarian part of the system costs—less if the dumber boss layers require fewer computations per subagent. Oversight could be tightened by increasing the ratio, or cost lowered by decreasing it. The extra computational cost would be a major factor in a competitive market scenario (cf.
Chapter 11
) but looks affordable in the context of a project that is not facing strong immediate competition. There would also be a cost in terms of the development time needed to create and test these supervisory functionalities. A well-resourced project could reduce this time cost by parallelizing the development of the control structure with the development of the machine intelligence; but the extra task load could be prohibitive for smaller projects and for projects caught in a close technology race.
One other type of cost also deserves consideration: the risk of mind crimes being committed in this kind of structure.
35
As described, the institution sounds like a rather horrible North Korean labor camp. Yet there are ways of at least mitigating the moral problems with running this kind of institution, even if the subagents contained in the institution are emulations with full human moral status. At a minimum, the system could rely on volunteering emulations. Each subagent could have the option at any time of withdrawing its participation.
36
Terminated emulations could be stored to memory, with a commitment to restart them under much more ideal conditions once the dangerous phase of the intelligence explosion is over. Meanwhile, subagents who chose to participate could be housed in very comfortable virtual environments and allowed ample time for sleep and recreation. These measures would impose a cost, one that should be manageable for a well-resourced project under noncompetitive conditions. In a highly competitive situation, the cost may be unaffordable unless an enterprise could be assured that its competitors would incur the same cost.