Authors: jon stokes
Tags: #Computers, #Systems Architecture, #General, #Microprocessors
5ns 6ns 7ns 8ns 9ns 10ns 11ns
Stored
Instructions
CPU
Fetch
Decode
Execute
Write
Completed
Instructions
Figure 3-10: Pipeline stalls in a four-stage pipeline would look different
without the effect of the “bubbles.”
Pipeline stalls—or bubbles—reduce a pipeline’s average instruction
throughput, because they prevent the pipeline from attaining the maxi-
mum throughput of one finished instruction per cycle. In Figure 3-10, the
orange instruction has stalled in the fetch stage for two extra cycles, creating two bubbles that will propagate through the pipeline. (Again, the bubble is
simply a way of signifying that the pipeline stage in which the bubble sits is
doing no work during that cycle.) Once the instructions below the bubble
have completed, the processor will complete no new instructions until the
bubbles move out of the pipeline. So at the ends of clock cycles 9 and 10, no
new instructions are added to the “Completed Instructions” region; normally,
two new instructions would be added to the region at the ends of these two
cycles. Because of the bubbles, though, the processor is two instructions
behind schedule when it hits the 11th clock cycle and begins racking up
completed instructions again.
The more of these bubbles that crop up in the pipeline, the farther away
the processor’s actual instruction throughput is from its maximum instruction
throughput. In the preceding example, the processor should ideally have
completed seven instructions by the time it finishes the 10th clock cycle, for
an average instruction throughput of 0.7 instructions per clock. (Remember,
the maximum instruction throughput possible under ideal conditions is one
instruction per clock, but many more cycles with no bubbles would be needed
to approach that maximum.) But because of the pipeline stall, the processor
only completes five instructions in 10 clocks, for an average instruction
throughput of 0.5 instructions per clock. 0.5 instructions per clock is half the Pipelined Execution
55
theoretical maximum instruction throughput, but of course the processor
spent a few clocks filling the pipeline, so it couldn’t have achieved that after 10 clocks, even under ideal conditions. More important is the fact that 0.5
instructions per clock is only 71 percent of the throughput that it could have
achieved were there no stall (i.e., 0.7 instructions per clock). Because pipeline stalls decrease the processor’s average instruction throughput, they increase
the amount of time that it takes to execute the currently running program.
If the program in the preceding example consisted of only the seven instruc-
tions pictured, then the pipeline stall would have resulted in a 29 percent
program execution time increase.
Look at the graph in Figure 3-11; it shows what that two-cycle stall does to
the average instruction throughput.
1
0.8
Average
Instruction
0.6
Throughput
(instructions/clock)
0.4
0.2
20
40
60
80
100
Clock Cycles
Figure 3-11: Average instruction throughput of a four-stage pipeline with a two-cycle stall
The processor’s average instruction throughput stops rising and begins
to plummet when the first bubble hits the write stage, and it doesn’t recover
until the bubbles have left the pipeline.
To get an even better picture of the impact that stalls can have on a
pipeline’s average instruction throughput, let’s now look at the impact that
a stall of 10 cycles (starting in the fetch stage of the 18th cycle) would have
over the course of 100 cycles in the four-stage pipeline described so far. Look
at the graph in Figure 3-12.
After the first bubble of the stall hits the write stage in the 20th clock, the
average instruction throughput stops increasing and begins to decrease. For