Authors: jon stokes
Tags: #Computers, #Systems Architecture, #General, #Microprocessors
second on a modern CPU, again and again and again. It’s only because the
computer executes these steps so rapidly that it’s able to present the illusion
that something much more conceptually complex is going on.
To return to our file-clerk analogy, a computer is like a file clerk who
sits at his desk all day waiting for messages from his boss. Eventually, the
boss sends him a message telling him to perform a calculation on a pair of
numbers. The message tells him which calculation to perform, and where in
his personal filing cabinet the necessary numbers are located. So the clerk
first retrieves the numbers from his filing cabinet, then performs the calcula-
tion, and finally places the results back into the filing cabinet. It’s a boring, mindless, repetitive task that’s repeated endlessly, day in and day out, which
is precisely why we’ve invented a machine that can do it efficiently and not
complain.
The Register File
Since numbers must first be fetched from storage before they can be added,
we want our data storage space to be as fast as possible so that the operation
can be carried out quickly. Since the ALU is the part of the processor that
does the actual addition, we’d like to place the data storage as close as
possible to the ALU so it can read the operands almost instantaneously.
However, practical considerations, such as a CPU’s limited surface area,
constrain the size of the storage area that we can stick next to the ALU. This
means that in real life, most computers have a relatively small number of very
fast data storage locations attached to the ALU. These storage locations are
called
registers
, and the first
x
86 computers only had eight of them to work with. These registers, which are arrayed in a storage structure called a
register
file
, store only a small subset of the data that the code stream needs (and we’ll talk about where the rest of that data lives shortly).
Basic Computing Concepts
7
Building on our previous, three-step description of what goes on when a
computer’s ALU is commanded to add two numbers, we can modify it as
follows. To execute an add instruction, the ALU must perform these steps:
1.
Obtain the two numbers to be added (the
input operands
) from two
source registers
.
2.
Add the numbers.
3.
Place the results back in a
destination register
.
For a concrete example, let’s look at addition on a simple computer
with only four registers, named A, B, C, and D. Suppose each of these registers
contains a number, and we want to add the contents of two registers together
and overwrite the contents of a third register with the resulting sum, as in the following operation:
Code
Comments
A + B = C
Add the contents of registers A and B, and place the result in C, overwriting
whatever was there.
Upon receiving an instruction commanding it to perform this addition
operation, the ALU in our simple computer would carry out the following
three familiar steps:
1.
Read the contents of registers A and B.
2.
Add the contents of A and B.
3.
Write the result to register C.
NOTE
You should recognize these three steps as a more specific form of the read-modify-write
sequence from earlier, where the generic modify step is replaced with an addition
operation.
This three-step sequence is quite simple, but it’s at the very core of how
a microprocessor really works. In fact, if you glance ahead to Chapter 10’s
discussion of the PowerPC 970’s pipeline, you’ll see that it actually has
separate stages for each of these three operations: stage 12 is the register
read step, stage 13 is the actual execute step, and stage 14 is the write-back
step. (Don’t worry if you don’t know what a pipeline is, because that’s a topic
for Chapter 3.) So the 970’s ALU reads two operands from the register file,
adds them together, and writes the sum back to the register file. If we were
to stop our discussion right here, you’d already understand the three core
stages of the 970’s main integer pipeline—all the other stages are either just
preparation to get to this point or they’re cleanup work after it.
RAM: When Registers Alone Won’t Cut It
Obviously, four (or even eight) registers aren’t even close to the theoretically infinite storage space I mentioned earlier in this chapter. In order to make a
viable computer that does useful work, you need to be able to store very large
8
Chapter 1
data sets. This is where the computer’s
main memory
comes in. Main memory, which in modern computers is always some type of
random access memory (RAM)
, stores the data set on which the computer operates, and only a small portion
of that data set at a time is moved to the registers for easy access from the
ALU (as shown in Figure 1-4).
Main Memory
Registers
ALU
CPU
Figure 1-4: A computer with a register file
Figure 1-4 gives only the slightest indication of it, but main memory is
situated quite a bit farther away from the ALU than are the registers. In fact,
the ALU and the registers are internal parts of the microprocessor, but main
memory is a completely separate component of the computer system that is
connected to the processor via the
memory bus
. Transferring data between main memory and the registers via the memory bus takes a significant
amount of time. Thus, if there were no registers and the ALU had to read
data directly from main memory for each calculation, computers would run
very slowly. However, because the registers enable the computer to store data
near the ALU, where it can be accessed nearly instantaneously, the computer’s
computational speed is decoupled somewhat from the speed of main memory.
(We’ll discuss the problem of memory access speeds and computational
performance in more detail in Chapter 11, when we talk about caches.)
The File-Clerk Model Revisited and Expanded
To return to our file-clerk metaphor, we can think of main memory as a
document storage room located on another floor and the registers as a
small, personal filing cabinet where the file clerk places the papers on
which he’s currently working. The clerk doesn’t really know anything
Basic Computing Concepts
9
about the document storage room—what it is or where it’s located—because
his desk and his personal filing cabinet are all he concerns himself with. For
documents that are in the storage room, there’s another office worker, the
office secretary, whose job it is to locate files in the storage room and retrieve them for the clerk.
This secretary represents a few different units within the processor, all
of which we’ll meet Chapter 4. For now, suffice it to say that when the boss
wants the clerk to work on a file that’s not in the clerk’s personal filing
cabinet, the secretary must first be ordered, via a message from the boss, to
retrieve the file from the storage room and place it in the clerk’s cabinet so
that the clerk can access it when he gets the order to begin working on it.
An Example: Adding Two Numbers
To translate this office example into computing terms, let’s look at how the
computer uses main memory, the register file, and the ALU to add two
numbers.
To add two numbers stored in main memory, the computer must
perform these steps:
1.
Load the two operands from main memory into the two source registers.
2.
Add the contents of the source registers and place the results in the
destination register, using the ALU. To do so, the ALU must perform
these steps:
a.
Read the contents of registers A and B into the ALU’s input ports.
b.
Add the contents of A and B in the ALU.
c.
Write the result to register C via the ALU’s output port.
3.
Store the contents of the destination register in main memory.
Since steps 2a, 2b, and 2c all take a trivial amount of time to complete,
relative to steps 1 and 3, we can ignore them. Hence our addition looks
like this:
1.
Load the two operands from main memory into the two source registers.
2.
Add the contents of the source registers, and place the results in the des-
tination register, using the ALU.
3.
Store the contents of the destination register in main memory.
The existence of main memory means that the user—the boss in our
filing-clerk analogy—must manage the flow of information between main
memory and the CPU’s registers. This means that the user must issue
instructions to more than just the processor’s ALU; he or she must also
issue instructions to the parts of the CPU that handle memory traffic.
Thus, the preceding three steps are representative of the kinds of instruc-
tions you find when you take a close look at the code stream.
10
Chapter 1
A Closer Look at the Code Stream: The Program
At the beginning of this chapter, I defined the code stream as consisting of
“an ordered sequence of operations,” and this definition is fine as far as it
goes. But in order to dig deeper, we need a more detailed picture of what the
code stream is and how it works.
The term
operations
suggests a series of simple arithmetic operations
like addition or subtraction, but the code stream consists of more than just
arithmetic operations. Therefore, it would be better to say that the code
stream consists of an ordered sequence of
instructions
. Instructions, generally speaking, are commands that tell the whole computer—not just the ALU,
but multiple parts of the machine—exactly what actions to perform. As we’ve
seen, a computer’s list of potential actions encompasses more than just
simple arithmetic operations.
General Instruction Types
Instructions are grouped into ordered lists that, when taken as a whole,
tell the different parts of the computer how to work together to perform a
specific task, like grayscaling an image or playing a media file. These ordered
lists of instructions are called
programs
, and they consist of a few basic types of instructions.
In modern RISC microprocessors, the act of moving data between
memory and the registers is under the explicit control of the code stream, or
program. So if a programmer wants to add two numbers that are located in
main memory and then store the result back in main memory, he or she
must write a list of instructions (a program) to tell the computer exactly what
to do. The program must consist of:
z
a load instruction to move the two numbers from memory into the
registers
z
an add instruction to tell the ALU to add the two numbers
z
a store instruction to tell the computer to place the result of the addition
back into memory, overwriting whatever was previously there
These operations fall into two main categories:
Arithmetic instructions
These instructions tell the ALU to perform an arithmetic calculation
(for example, add, sub, mul, div).
Memory-access instructions
These instructions tell the parts of the processor that deal with main
memory to move data from and to main memory (for example, load
and store).
NOTE
We’ll discuss a third type of instruction, the branch instruction, shortly. Branch
instructions are technically a special type of memory-access instruction, but they access
code storage instead of data storage. Still, it’s easier to treat branches as a third category
of instruction.
Basic Computing Concepts
11
The
arithmetic instruction
fits with our calculator metaphor and is the
type of instruction most familiar to anyone who’s worked with computers.
Instructions like integer and floating-point addition, subtraction, multipli-
cation, and division all fall under this general category.
NOTE
In order to simplify the discussion and reduce the number of terms, I’m temporarily
including logical operations like AND, OR, NOT, NOR, and so on, under the general
heading of arithmetic instructions. The difference between arithmetic and logical
operations will be introduced in Chapter 2.
The
memory-access instruction
is just as important as the arithmetic
instruction, because without access to main memory’s data storage regions,
the computer would have no way to get data into or out of the register file.
To show you how memory-access and arithmetic operations work together
within the context of the code stream, the remainder of this chapter will use a
series of increasingly detailed examples. All of the examples are based on a
simple, hypothetical computer, which I’ll call the DLW-1.2
The DLW-1’s Basic Architecture and Arithmetic Instruction Format
The DLW-1 microprocessor consists of an ALU (along with a few other units