Authors: jon stokes
Tags: #Computers, #Systems Architecture, #General, #Microprocessors
In the previous chapter, you learned that a computer repeats three basic
steps over and over again in order to execute a program:
1.
Fetch
the next instruction from the address stored in the program
counter and load that instruction into the instruction register.
Increment the program counter.
2.
Decode
the instruction in the instruction register.
3.
Execute
the instruction in the instruction register.
You should also recall that step 3, the execute step, itself can consist of
multiple sub-steps, depending on the type of instruction being executed
(arithmetic, memory access, or branch). In the case of the arithmetic
instruction add A, B, C, the example we used last time, the three sub-steps
are as follows:
1.
Read
the contents of registers A and B.
2.
Add
the contents of A and B.
3.
Write
the result back to register C.
Thus the expanded list of actions required to execute an arithmetic
instruction is as follows (substitute any other arithmetic instruction for
add
in the following list to see how it’s executed):
1.
Fetch
the next instruction from the address stored in the program
counter and load that instruction into the instruction register.
Increment the program counter.
2.
Decode
the instruction in the instruction register.
3.
Execute
the instruction in the instruction register. Because the instruction is not a branch instruction but an arithmetic instruction, send it to the
arithmetic logic unit (ALU).
a.
Read
the contents of registers A and B.
b.
Add
the contents of A and B.
c.
Write
the result back to register C.
36
Chapter 3
At this point, I need to make a modification to the preceding list. For
reasons we’ll discuss in detail when we talk about the instruction window
in Chapter 5, most modern microprocessors treat sub-steps 3a and 3b as
a group, while they treat step 3c, the register write, separately. To reflect this conceptual and architectural division, this list should be modified to look as
follows:
1.
Fetch
the next instruction from the address stored in the program
counter, and load that instruction into the instruction register.
Increment the program counter.
2.
Decode
the instruction in the instruction register.
3.
Execute
the instruction in the instruction register. Because the instruction is not a branch instruction but an arithmetic instruction, send it to
the ALU.
a.
Read
the contents of registers A and B.
b.
Add
the contents of A and B.
4.
Write
the result back to register C.
In a modern processor, these four steps are repeated over and over again
until the program is finished executing. These are, in fact, the four stages in
a classic RISC1 pipeline. (I’ll define the term
pipeline
shortly; for now, just think of a pipeline as a series of stages that each instruction in the code
stream must pass through when the code stream is being executed.) Here
are the four stages in their abbreviated form, the form in which you’ll most
often see them:
1.
Fetch
2.
Decode
3.
Execute
4.
Write (or “write-back”)
Each of these stages could be said to represent one
phase
in the
lifecycle
of an instruction. An instruction starts out in the
fetch phase
, moves to the
decode phase
, then to the
execute phase
, and finally to the
write phase
. As I mentioned
in “The Clock” on page 29,
each phase takes a fixed, but by no means equal, amount of time. In most of the example processors with which you’ll
be working in this chapter, all four phases take an equal amount of time;
this is not usually the case in real-world processors. In any case, if the DLW-1
takes exactly 1 nanosecond (ns) to complete each phase, then the DLW-1
can finish one instruction every 4 ns.
1 The term
RISC
is an acronym for
Reduced Instruction Set Computing
. I’ll cover this term in more detail in Chapter 5.
Pipelined Execution
37
Basic Instruction Flow
One useful division that computer architects often employ when talking
about CPUs is that of
front end
versus
back end
. As you already know, when instructions are fetched from main memory, they must be decoded for
execution. This fetching and decoding takes place in the processor’s front
end.
You can see in Figure 3-1 that the front end roughly corresponds to the
control and I/O units in the previous chapter’s diagram of the DLW-1’s
programming model. The ALU and registers constitute the back end of the
DLW-1. Instructions make their way from the front end down through the
back end, where the work of number crunching gets done.
Front End
Back End
Control Unit
Registers
Program Counter (PC)
A
B
Instruction Register
C
D
Proc. Status Word (PSW)
Data Bus
I/O Unit
ALU
Address
Bus
Figure 3-1: Front end versus back end
We can now modify Figure 1-4 to show all four phases of execution
(see Figure 3-2).
38
Chapter 3
Fetch
Decode
Execute
Write
Figure 3-2: Four phases of execution
From here on out, we’re going to focus primarily on the code stream,
and more specifically, on how instructions enter and flow through the
microprocessor, so the diagrams will need to leave out the data and results
streams entirely. Figure 3-3 presents a microprocessor’s basic instruction flow
in a manner that’s straightforward, yet easily elaborated upon.
Front End
Fetch
Decode
Back End
ALU
Execute
Write
Figure 3-3: Basic instruction flow
Pipelined Execution
39
In Figure 3-3, instructions flow from the front end’s fetch and decode
phases into the back end’s execute and write phases. (Don’t worry if this
seems too simple. As the level of complexity of the architectures under
discussion increases, so will the complexity of the diagrams.)
Pipelining Explained
Let’s say my friends and I have decided to go into the automotive manu-
facturing business and that our first product is to be a sport utility vehicle
(SUV). After some research, we determine that there are five stages in
the SUV-building process:
Stage 1:
Build the chassis.
Stage 2:
Drop the engine into the chassis.
Stage 3:
Put the doors, a hood, and coverings on the chassis.
Stage 4:
Attach the wheels.
Stage 5:
Paint the SUV.
Each of these stages requires the use of highly trained workers with very
specialized skill sets—workers who are good at building chasses don’t know
much about engines, bodywork, wheels, or painting, and likewise for engine
builders, painters, and the other crews. So when we make our first attempt to
put together an SUV factory, we hire and train five crews of specialists, one
for each stage of the SUV-building process. There’s one crew to build the
chassis, one to drop the engines, one to put the doors, hood, and coverings
on the chassis, another for the wheels, and a painting crew. Finally, because
the crews are so specialized and efficient, each stage of the SUV-building
process takes a crew exactly one hour to complete.
Now, since my friends and I are computer types and not industrial engi-
neers, we had a lot to learn about making efficient use of factory resources.
We based the functioning of our first factory on the following plan: Place all
five crews in a line on the factory floor, and have the first crew start an SUV at Stage 1. After Stage 1 is complete, the Stage 1 crew passes the partially finished SUV off to the Stage 2 crew and then hits the break room to play some foos-ball, while the Stage 2 crew builds the engine and drops it in. Once the
Stage 2 crew is done, the SUV moves down to Stage 3, and the Stage 3 crew
takes over, while the Stage 2 crew joins the Stage 1 crew in the break room.
The SUV moves on down the line through all five stages in this way, with
only one crew working on one stage at any given time while the rest of the
crews sit idle. Once the completed SUV finishes Stage 5, the crew at Stage 1
starts on another SUV. At this rate, it takes exactly five hours to finish a single SUV, and our factory completes one SUV every five hours.
In Figure 3-4, you can see the SUV pass through all five stages. The SUV
enters the factory floor at the beginning of the first hour, where the Stage 1
crew begins work on it. Notice that all of the other crews are sitting idle while the Stage 1 crew does its work. At the beginning of the second hour, the
Stage 2 crew takes over, and the other four crews sit idle while waiting on
40
Chapter 3
Stage 2. This process continues as the SUV moves down the line, until at the
beginning of the sixth hour, one SUV stands completed and while another
has entered Stage 1.
1hr
2hr
3hr
4hr
5hr
6hr
Factory
Floor
Completed
SUVs
Figure 3-4: The lifecycle of an SUV in a non-pipelined factory
Fast-forward one year. Our SUV, the Extinction LE, is selling like . . .
well, it’s selling like an SUV, which means it’s doing pretty well. In fact, our SUV is selling so well that we’ve attracted the attention of the military and
have been offered a contract to provide SUVs to the U.S. Army on an ongoing
basis. The Army likes to order multiple SUVs at a time; one order might
come in for 10 SUVs, and another order might come in for 500 SUVs. The
more of these orders that we can fill each fiscal year, the more money we can
make during that same period and the better our balance sheet looks. This,
of course, means that we need to find a way to increase the number of SUVs
that our factory can complete per hour, known as our factory’s
SUV completion
rate
. By completing more SUVs per hour, we can fill the Army’s orders faster and make more money each year.
The most intuitive way to go about increasing our factory’s SUV comple-
tion rate is to try and decrease the production time of each SUV. If we can
get the crews to work twice as fast, our factory can produce twice as many
SUVs in the same amount of time. Our crews are already working as hard
as they can, though, so unless there’s a technological breakthrough that
increases their productivity, this option is off the table for now.
Since we can’t speed up our crews, we can always use the brute-force
approach and just throw money at the problem by building a second assembly
line. If we hire and train five new crews to form a second assembly line, also
capable of producing one car every five hours, we can complete a grand total
of two SUVs every five hours from the factory floor—double the SUV comple-
tion rate of our present factory. This doesn’t seem like a very efficient use of factory resources, though, since not only do we have twice as many crews
working at once but we also have twice as many crews in the break room at
once. There has to be a better way.
Pipelined Execution
41
Faced with a lack of options, we hire a team of consultants to figure out a
clever way to increase overall factory productivity without either doubling the
number of crews or increasing each individual crew’s productivity. One year
and thousands of billable hours later, the consultants hit upon a solution.
Why let our crews spend four-fifths of their work day in the break room,
when they could be doing useful work during that time? With proper sched-
uling of the existing five crews, our factory can complete
one SUV each hour
, thus drastically improving both the efficiency and the output of our assembly
line. The revised workflow would look as follows:
1.
The Stage 1 crew builds a chassis.
2.
Once the chassis is complete, they send it on to the Stage 2 crew.
3.
The Stage 2 crew receives the chassis and begins dropping the engine in,
while the Stage 1 crew starts on a new chassis.
4.
When both Stage 1 and Stage 2 crews are finished, the Stage 2 crew’s
work advances to Stage 3, the Stage 1 crew’s work advances to Stage 2,
and the Stage 1 crew starts on a new chassis.
Figure 3-5 illustrates this workflow in action. Notice that multiple crews
have multiple SUVs simultaneously in progress on the factory floor. Compare
this figure to Figure 3-4, where only one crew is active at a time and only one
SUV is on the factory floor at a time.