Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture (73 page)

Read Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture Online

Authors: jon stokes

Tags: #Computers, #Systems Architecture, #General, #Microprocessors

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture
9.39Mb size Format: txt, pdf, ePub

unit examines the 512-entry BHT and decides to speculatively take a branch,

it doesn’t have to go to code storage to fetch the first instruction from that

branch’s target address. Instead, the BPU loads the branch’s target instruction

directly from the BTIC into the instruction queue, which means that the

processor doesn’t have to wait around for the fetch logic to go out and fetch

the target instruction from code storage. This scheme saves valuable cycles,

and it helps keep performance-killing bubbles out of the 750’s pipeline.

Summary: The PowerPC 750 in Historical Context

In spite of its short pipeline and small instruction window, the 750 packed quite a punch. It managed to outperform the 604, partially because of a dedicated

back-side L2 cache interface that allowed it to offload L2 traffic from the front-side bus. It was so successful that a 604 derivative was scrapped in favor of just building on the 750. The 750 and its immediate successors, all of which went

under the name of
G3
, eventually found widespread use both as embedded

devices and across Apple’s entire product line, from its portables to its

workstations.

The G3 lacked one important feature that separated it from the
x
86

competition, though: vector computing capabilities. While comparable

PC processors supported SIMD in the form of Intel’s and AMD’s vector

132

Chapter 6

extensions to the
x
86 instruction set, the G3 was stuck in the world of scalar computing. So when Motorola decided to develop the G3 into an even

more capable embedded and media workstation chip, this lack was the first

thing it addressed.

The PowerPC 7400 (aka the G4)

The Motorola MPC7400 (aka the G4) was designed as a media processing

powerhouse for desktops and portables. Apple Computer used the 7400 as

the CPU in the first version of their G4 workstation line, and this processor

was later replaced by a lower-power version—the 7410—before the 7450

(aka the G4+ or G4e) was introduced. Today, the successors to the 7400/7410

have seen widespread use as
embedded processors
, which means that they’re used in routers and other non-PC devices that need a microprocessor with

low power consumption and strong DSP capabilities. Table 6-5 lists the

features of the PowerPC 7400.

Table 6-5:
Features of the PowerPC 7400

Introduction Date

September 1999

Process

0.20 micron

Transistor Count

10.5 million

Die Size

83 mm2

Clock Speed at Introduction

400–600 MHz

Cache Sizes

64KB split L1, 2MB L2 supported via on-chip tags

First Appeared In

Power Macintosh G4

Figure 6-5 illustrates the PowerPC 7400 microarchitecture.

Except for the addition of SIMD capabilities, which we’ll discuss in the

next chapter, the G4 is essentially the same as the 750. Motorola’s technical

summary of the G4 has this to say about the G4 compared to the 750:

The design philosophy on the MPC7410 (and the MPC7400)

is to change from the MPC750 base only where required to

gain compelling multimedia and multiprocessor performance.

The MPC7410’s core is essentially the same as the MPC750’s,

except that whereas the MPC750 has a 6-entry completion queue

and has slower performance on some floating-point double-

precision operations, the MPC7410 has an 8-entry completion

queue and a full double-precision FPU. The MPC7410 also adds

the AltiVec instruction set, has a new memory subsystem, and can

interface to the improved MPX bus.


MPC7410 RISC Microprocessor Technical Summary, section 3.11.

PowerPC Processors: 600 Series, 700 Series, and 7400

133

Front End

Instruction Fetch

BU

Branch

Instruction Queue

Unit

Decode/Dispatch

Reserv.

Reserv.

Reserv.

Reserv.

Reserv.

Reserv.

Station

Station

Station

Station

Station

Station

VPU-1

VSIU-1

VCIU-1 VFPU-1

FPU-1

IU1-1

IU2-1

LSU-1

VCIU-2 VFPU-2

FPU-2

LSU-2

VCIU-3 VFPU-3

FPU-3

Vector

Load-

VFPU-4

Permute

Vector

Floating-

Integer

Store

Unit

ALU

Point Unit

Unit

Unit

Memory Access

Vector Arithmetic Logic Units

Scalar Arithmetic Logic Units

Units

Back End

Completion

Queue

Write

Commit Unit

Figure 6-5: Microarchitecture of the PowerPC 7400

Aside from the vector execution unit, the most important difference in

the back ends of the two units lies in the G4’s improved FPU. The G4’s FPU

is a full-blown double-precision FPU, and it does single- and double-precision

floating-point operations, including multiply and multiply-add, in three fully-

pipelined cycles.

With respect to the instruction window, the G4 has the same number

and configuration of reservation stations as the 750. (Note that the G4’s two

vector execution units, which were not present on the 750, each have a one-

entry reservation station.) The only difference is that the G4’s instruction

queue has been lengthened to eight entries from the 750’s original six as a

way of reducing dispatch bottlenecks.

134

Chapter 6

The G4’s Vector Unit

In the late 1990s, Apple, Motorola, and IBM jointly developed a set of SIMD

extensions to the PowerPC instruction set for use in the PowerPC processor

series. These SIMD extensions went by different names: IBM called them

VMX, and Motorola called them AltiVec. This book will refer to these exten-

sions using Motorola’s AltiVec label.

The new AltiVec instructions, which I’ll cover in detail in Chapter 8, were

first introduced in the G4. The G4 executes these instructions in its vector

unit, which consists of two vector execution units: the
vector ALU (VALU)
and the
vector permute unit (VPU)
. The VALU performs vector arithmetic and logical operations, while the VPU performs permute and shift operations on vectors.

To support the AltiVec instructions, which can operate on up to 128 bits of

data at a time, 32 new 128-bit vector registers were added to the PowerPC ISA.

On the G4, these 32 architectural registers are accompanied by 6 vector

rename registers.

Summary: The PowerPC G4 in Historical Context

The G4’s AltiVec instruction set was a hit, and it began to see widespread use

by Apple and by Motorola’s embedded customers. But there was still much

room for improvement to the G4’s AltiVec implementation. In particular, the

vector unit’s single VALU was tasked with handling all integer and floating-

point vector operations. Just like scalar code benefits from the presence of

multiple specialized scalar ALUs, vector performance could be improved by

splitting the burden of vector computation among multiple specialized VALUs

operating in parallel. Such an improvement would have to wait for the succes-

sor to the G4—the G4e.

The major problem with the G4 was that its short, four-stage pipeline

severely limited the upward scalability of its clock rate. While Intel and AMD

were locked in the gigahertz race, Motorola’s G4 was stuck around the 500 MHz

mark for quite a long time. As a result, Apple’s
x
86 competitors soon surpassed it in both clock speed and performance, leaving what was once the most powerful commodity RISC workstation line in serious trouble with the market.

Conclusion

The 600 series saw the PPC line go from the new kid on the block to a mature

RISC alternative that brought Apple’s PowerMac workstation to the forefront

of personal computing performance. While the initial 601 had a few teeth-

ing problems, the line was in great shape after the 603e and 604e made it

to market. The 603e was a superb mobile chip that worked well in Apple’s

laptops, and even though it had a more limited instruction dispatch/commit

bandwidth and a smaller cache than the 601, it still managed to beat its

predecessor because of its more efficient use of transistors.

PowerPC Processors: 600 Series, 700 Series, and 7400

135

The 604 doubled the 603’s instruction dispatch and commit bandwidth,

and it sported a wider back end and a larger instruction window that enabled

its back end to grind through more instructions per clock. Furthermore, its

pipeline was deepened in order to increase the number of instructions per

clock and to allow for better clock speed scaling. The end result was that the

604 was a strong enough desktop chip to keep the PowerMac comfortably in

the performance game.

It’s important to remember, though, that the 600 series reigned at a time

when transistor budgets were still relatively small by today’s standards, so the PowerPC architecture’s RISC nature gave it a definite cost, performance, and

power consumption edge over the
x
86 competition. This is not to say that the 600 series was always in the performance lead; it wasn’t. The performance

crown changed hands a number of time during this period.

During the heyday of the 600 series and into the dawn of the G3 era, the

fact that PowerPC was a RISC ISA was a strong mark in the platform’s favor.

But as Moore’s Curves drove transistor counts and MHz numbers ever higher,

the relative cost of legacy
x
86 support began to go down and the PowerPC

ISA’s RISC advantage started to wane. By the time the 7400 hit the market,

x
86 processors from Intel and AMD were already catching up to it in performance, and by the time the gigahertz race was over, Apple’s flagship workstation

line was in trouble. The 7400’s clock speed and performance had stagnated

for too long during a period when Intel and AMD were locked in a heated

price/performance competition.

Apple’s stop-gap solution to this problem was to turn to
symmetric

multiprocessing (SMP)
in order to increase the performance of its desktop line. (See Chapter 12 for a more detailed discussion of SMP.) By offering

computers in which two G4s worked together to execute code and process

data, Apple hoped to pack more processing power into its computers in

a way that didn’t rely on Motorola to ramp up clock speeds. The dual G4

met with mixed success in the market, and it wasn’t until the debut of the

significantly redesigned PowerPC 7450 (aka G4+ or G4e) that Apple saw the

per-processor performance of its workstations improve. The introduction of

the G4e into its workstation line enabled Apple to recover some ground in its

race with its primary competitor in the PC space—systems based on Intel’s

Pentium 4.

136

Chapter 6

I N T E L ’ S P E N T I U M 4 V S .

M O T O R O L A ’ S G 4 E : A P P R O A C H E S

A N D D E S I G N P H I L O S O P H I E S

Now that we’ve covered not only the microprocessor

basics but also the development of two popular
x
86

and PowerPC processor lines, you’re equipped to com-

pare and to understand two of the processors that have

been among the most popular examples of these two

lines: Intel’s Pentium 4 and Motorola’s G4e.

When the Pentium 4 hit the market in November 2000, it was the first

major new
x
86 microarchitecture from Intel since the 1995 introduction of the Pentium Pro. In the years prior to the Pentium 4’s launch, the Pentium

Pro’s P6 core dominated the market in its incarnations as the Pentium II and

Pentium III, and anyone who was paying attention during that time learned

at least one major lesson: Clock speed sells. Intel was definitely paying atten-

tion, and as the Willamette team members labored away in Hillsboro, Oregon,

they kept MHz foremost in their minds. This singular focus is evident in every-

thing from Intel’s Pentium 4 promotional and technical literature down to

the very last detail of the processor’s design. As this chapter will show, the

successor to the most successful
x
86 microarchitecture of all time was a machine built from the ground up for stratospheric clock speed.

NOTE

Willamette
was Intel’s code name for the Pentium 4 while the project was in development. Intel’s projects are usually code-named after rivers in Oregon. Many companies
use code names that follow a certain convention, like Apple’s use of the names of large
cats for versions of OS X.

Motorola introduced MPC7450 in January 2001, and Apple quickly

adopted it under the
G4
moniker. Because the 7450 represented a significant departure from the 7400, the 7450 was often referred to as the G4e or the

G4+, so throughout this chapter we’ll call it the G4e. The new processor had

a slightly deeper pipeline, which allowed it to scale to higher clock speeds, and both its front end and back ends boasted a whole host of improvements that

Other books

Betrayals of the Heart by Ohnoutka, Melissa
The Things She Says by Kat Cantrell
Complete Works, Volume IV by Harold Pinter
The Shadow of Arms by Hwang Sok-Yong
The High King: A Tale of Alus by Wigboldy, Donald
The Wishing Garden by Christy Yorke
Three Fates by Nora Roberts
Gone to Texas by Don Worcester
The Rescuer by Joyce Carol Oates
Bitter Remedy by Conor Fitzgerald