Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture (57 page)

Read Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture Online

Authors: jon stokes

Tags: #Computers, #Systems Architecture, #General, #Microprocessors

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture

10.44Mb size Format: txt, pdf, ePub

Read Book Download Book

5ns 6ns 7ns 8ns 9ns 10ns 11ns

Stored

Instructions

CPU

Fetch

Decode

Execute

Write

Completed

Instructions

Figure 3-10: Pipeline stalls in a four-stage pipeline would look different

without the effect of the “bubbles.”

Pipeline stalls—or bubbles—reduce a pipeline’s average instruction

throughput, because they prevent the pipeline from attaining the maxi-

mum throughput of one finished instruction per cycle. In Figure 3-10, the

orange instruction has stalled in the fetch stage for two extra cycles, creating two bubbles that will propagate through the pipeline. (Again, the bubble is

simply a way of signifying that the pipeline stage in which the bubble sits is

doing no work during that cycle.) Once the instructions below the bubble

have completed, the processor will complete no new instructions until the

bubbles move out of the pipeline. So at the ends of clock cycles 9 and 10, no

new instructions are added to the “Completed Instructions” region; normally,

two new instructions would be added to the region at the ends of these two

cycles. Because of the bubbles, though, the processor is two instructions

behind schedule when it hits the 11th clock cycle and begins racking up

completed instructions again.

The more of these bubbles that crop up in the pipeline, the farther away

the processor’s actual instruction throughput is from its maximum instruction

throughput. In the preceding example, the processor should ideally have

completed seven instructions by the time it finishes the 10th clock cycle, for

an average instruction throughput of 0.7 instructions per clock. (Remember,

the maximum instruction throughput possible under ideal conditions is one

instruction per clock, but many more cycles with no bubbles would be needed

to approach that maximum.) But because of the pipeline stall, the processor

only completes five instructions in 10 clocks, for an average instruction

throughput of 0.5 instructions per clock. 0.5 instructions per clock is half the Pipelined Execution

theoretical maximum instruction throughput, but of course the processor

spent a few clocks filling the pipeline, so it couldn’t have achieved that after 10 clocks, even under ideal conditions. More important is the fact that 0.5

instructions per clock is only 71 percent of the throughput that it could have

achieved were there no stall (i.e., 0.7 instructions per clock). Because pipeline stalls decrease the processor’s average instruction throughput, they increase

the amount of time that it takes to execute the currently running program.

If the program in the preceding example consisted of only the seven instruc-

tions pictured, then the pipeline stall would have resulted in a 29 percent

program execution time increase.

Look at the graph in Figure 3-11; it shows what that two-cycle stall does to

the average instruction throughput.

0.8

Average

Instruction

0.6

Throughput

(instructions/clock)

0.4

0.2

100

Clock Cycles

Figure 3-11: Average instruction throughput of a four-stage pipeline with a two-cycle stall
The processor’s average instruction throughput stops rising and begins

to plummet when the first bubble hits the write stage, and it doesn’t recover

until the bubbles have left the pipeline.

To get an even better picture of the impact that stalls can have on a

pipeline’s average instruction throughput, let’s now look at the impact that

a stall of 10 cycles (starting in the fetch stage of the 18th cycle) would have

over the course of 100 cycles in the four-stage pipeline described so far. Look

at the graph in Figure 3-12.

After the first bubble of the stall hits the write stage in the 20th clock, the

average instruction throughput stops increasing and begins to decrease. For