Read The Singularity Is Near: When Humans Transcend Biology Online
Authors: Ray Kurzweil
Tags: #Non-Fiction, #Fringe Science, #Retail, #Technology, #Amazon.com
To Train the Neural Net
- Run repeated recognition trials on sample problems.
- After each trial, adjust the synaptic strengths of all the interneuronal connections to improve the performance of the neural net on this trial (see the discussion below on how to do this).
- Continue this training until the accuracy rate of the neural net is no longer improving (i.e., reaches an asymptote).
Key Design Decisions
In the simple schema above, the designer of this neural-net algorithm needs to determine at the outset:
- What the input numbers represent.
- The number of layers of neurons.
- The number of neurons in each layer. (Each layer does not necessarily need to have the same number of neurons.)
- The number of inputs to each neuron in each layer. The number of inputs (i.e., interneuronal connections) can also vary from neuron to neuron and from layer to layer.
- The actual “wiring” (i.e., the connections). For each neuron in each layer, this consists of a list of other neurons, the outputs of which constitute the
inputs to this neuron. This represents a key design area. There are a number of possible ways to do this:
(i) Wire the neural net randomly; or
(ii) Use an evolutionary algorithm (see below) to determine an optimal wiring; or
(iii) Use the system designer’s best judgment in determining the wiring.
- The initial synaptic strengths (i.e., weights) of each connection. There are a number of possible ways to do this:
(i) Set the synaptic strengths to the same value; or
(ii) Set the synaptic strengths to different random values; or
(iii) Use an evolutionary algorithm to determine an optimal set of initial values; or
(iv) Use the system designer’s best judgment in determining the initial values.
- The firing threshold of each neuron.
- The output. The output can be:
(i) the outputs of layer
M
of neurons; or
(ii) the output of a single output neuron, the inputs of which are the outputs of the neurons in layer
M
;or
(iii) a function of (e.g., a sum of) the outputs of the neurons in layer
M
;or
(iv) another function of neuron outputs in multiple layers.
- How the synaptic strengths of all the connections are adjusted during the training of this neural net. This is a key design decision and is the subject of a great deal of research and discussion. There are a number of possible ways to do this:
(i) For each recognition trial, increment or decrement each synaptic strength by a (generally small) fixed amount so that the neural net’s output more closely matches the correct answer. One way to do this is to try both incrementing and decrementing and see which has the more desirable effect. This can be time-consuming, so other methods exist for making local decisions on whether to increment or decrement each synaptic strength.
(ii) Other statistical methods exist for modifying the synaptic strengths after each recognition trial so that the performance of the neural net on that trial more closely matches the correct answer.
Note that neural-net training will work even if the answers to the training trials are not all correct. This allows using real-world training data that may have an inherent error rate. One key to the success
of a neural net–based recognition system is the amount of data used for training. Usually a very substantial amount is needed to obtain satisfactory results. Just like human students, the amount of time that a neural net spends learning its lessons is a key factor in its performance.
Variations
Many variations of the above are feasible:
- There are different ways of determining the topology. In particular, the interneuronal wiring can be set either randomly or using an evolutionary algorithm.
- There are different ways of setting the initial synaptic strengths.
- The inputs to the neurons in layer
i
do not necessarily need to come from the outputs of the neurons in layer
i 1
. Alternatively, the inputs to the neurons in each layer can come from any lower layer or any layer.- There are different ways to determine the final output.
- The method described above results in an “all or nothing” (1 or 0) firing called a nonlinearity. There are other nonlinear functions that can be used. Commonly a function is used that goes from 0 to 1 in a rapid but more gradual fashion. Also, the outputs can be numbers other than 0 and 1.
- The different methods for adjusting the synaptic strengths during training represent key design decisions.
The above schema describes a “synchronous” neural net, in which each recognition trial proceeds by computing the outputs of each layer, starting with layer
0
through layer
M
. In a true parallel system, in which each neuron is operating independently of the others, the neurons can operate “asynchronously” (that is, independently). In an asynchronous approach, each neuron is constantly scanning its inputs and fires whenever the sum of its weighted inputs exceeds its threshold (or whatever its output function specifies).
173
. See
chapter 4
for a detailed discussion of brain reverse engineering. As one example of the progression, S. J. Thorpe writes: “We have really only just begun what will certainly be a long term project aimed at reverse engineering the primate visual system. For the moment, we have only explored some very simple architectures, involving essentially just feed-forward architectures involving a relatively small numbers of layers. . . . In the years to come, we will strive to incorporate as many of the computational tricks used by the primate and human visual system as possible. More to the point, it seems that by adopting the spiking neuron approach, it will soon be possible to develop sophisticated systems capable of simulating very large neuronal networks in real time.” S. J. Thorpe et al., “Reverse Engineering of the Visual System Using Networks of Spiking Neurons,”
Proceedings of the IEEE 2000 International Symposium on Circuits and Systems
IV (IEEE Press), pp. 405–8,
http://www.sccn.ucsd.edu/~arno/mypapers/thorpe.pdf
.
174
. T. Schoenauer et al. write: “Over the past years a huge diversity of hardware for artificial neural networks (ANN) has been designed. . . . Today one can choose from a wide range of neural network hardware. Designs differ in terms of architectural approaches, such as neurochips, accelerator boards and multi-board neurocomputers, as well as concerning the purpose of the system, such as the ANN algorithm(s) and the system’s versatility. . . . Digital neurohardware can be classified by the:[
sic
] system architecture, degree of parallelism, typical neural network partition per processor, inter-processor communication network and numerical representation.” T. Schoenauer, A. Jahnke, U. Roth, and H. Klar, “Digital Neurohardware: Principles and Perspectives,” in
Proc. Neuronale Netze in der Anwendung
—Neural Networks in Applications NN’98, Magdeburg, invited paper (February 1998): 101–6,
http://bwrc.eecs.berkeley.edu/People/kcamera/neural/papers/
schoenauer98digital.pdf
. See also Yihua Liao, “Neural Networks in Hardware: A Survey” (2001),
http://ailab.das.ucdavis.edu/~yihua/research/NNhardware.pdf
.
175
. Here is the basic schema for a genetic (evolutionary) algorithm. Many variations are possible, and the designer of the system needs to provide certain critical parameters and methods, detailed below.
THE EVOLUTIONARY ALGORITHM
Create N solution “creatures.” Each one has:
- A genetic code: a sequence of numbers that characterize a possible solution to the problem. The numbers can represent critical parameters, steps to a solution, rules, etc.
For each generation of evolution, do the following:
- Do the following for each of the N solution creatures:
(i) Apply this solution creature’s solution (as represented by its genetic code) to the problem, or simulated environment.
(ii) Rate the solution.
- Pick the L solution creatures with the highest ratings to survive into the next generation.
- Eliminate the (N L) nonsurviving solution creatures.
- Create (N L) new solution creatures from the L surviving solution creatures by:
(i) Making copies of the L surviving creatures. Introduce small random variations into each copy; or
(ii) Creating additional solution creatures by combining parts of the genetic code (using “sexual” reproduction, or otherwise combining portions of the chromosomes) from the L surviving creatures; or
(iii) Doing a combination of (i) and (ii).
- Determine whether or not to continue evolving:
Improvement = (highest rating in this generation) – (highest rating in the previous generation).
If Improvement- The solution creature with the highest rating from the last generation of evolution has the best solution. Apply the solution defined by its genetic code to the problem.
Key Design Decisions
In the simple schema above, the designer needs to determine at the outset:
- Key parameters:
N
L
Improvement threshold- What the numbers in the genetic code represent and how the solution is computed from the genetic code.
- A method for determining the N solution creatures in the first generation. In general, these need only be “reasonable” attempts at a solution. If these first-generation solutions are too far afield, the evolutionary algorithm may have difficulty converging on a good solution. It is often worthwhile to create the initial solution creatures in such a way that they are reasonably diverse. This will help prevent the evolutionary process from just finding a “locally” optimal solution.
- How the solutions are rated.
- How the surviving solution creatures reproduce.
Variations
Many variations of the above are feasible. For example:
- There does not need to be a fixed number of surviving solution creatures (L) from each generation. The survival rule(s) can allow for a variable number of survivors.
- There does not need to be a fixed number of new solution creatures created in each generation (N L). The procreation rules can be independent of the size of the population. Procreation can be related to survival, thereby allowing the fittest solution creatures to procreate the most.
- The decision as to whether or not to continue evolving can be varied. It can consider more than just the highest-rated solution creature from the most recent generation(s). It can also consider a trend that goes beyond just the last two generations.
176
. Sam Williams, “When Machines Breed,” August 12, 2004,
http://www.salon.com/tech/feature/2004/08/12/evolvable_hardware/
index_np.html
.
177
. Here is the basic scheme (algorithm description) of recursive search. Many variations
are possible, and the designer of the system needs to provide certain critical parameters and methods, detailed below.
THE RECURSIVE ALGORITHM
Define a function (program) “Pick Best Next Step.” The function returns a value of “SUCCESS” (we’ve solved the problem) or “FAILURE” (we didn’t solve it). If it returns with a value of SUCCESS, then the function also returns the sequence of steps that solved the problem.
PICK BEST NEXT STEP does the following:
- Determine if the program can escape from continued recursion at this point. This bullet, and the next two bullets deal with this escape decision.
First, determine if the problem has now been solved. Since this call to Pick Best Next Step probably came from the program calling itself, we may now have a satisfactory solution. Examples are:
(i) In the context of a game (for example, chess), the last move allows us to win (such as checkmate).
(ii) In the context of solving a mathematical theorem, the last step proves the theorem.
(iii) In the context of an artistic program (for example, a computer poet or composer), the last step matches the goals for the next word or note.
If the problem has been satisfactorily solved, the program returns with a value of “SUCCESS” and the sequence of steps that caused the success.