Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture (60 page)

Read Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture Online

Authors: jon stokes

Tags: #Computers, #Systems Architecture, #General, #Microprocessors

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture

2.76Mb size Format: txt, pdf, ePub

Read Book Download Book

by labeled white boxes (SIU, CIU, FPU, BU, etc.) that designate the type of

execution unit that’s modifying the code stream during the execution phase.

Notice also that the figure contains a slight shift in terminology that I should clarify before we move on.

Until now, I’ve been using the term
ALU
as synonymous with
integer

execution unit
. After the previous section, however, we know that a microprocessor does arithmetic and logical operations on more than just integer

data, so we have to be more precise in our terminology. From now on,
ALU

is a general term for any execution unit that performs arithmetic and logical

operations on any type of data. More specific labels will be used to identify

the ALUs that handle specific types of instructions and numerical data. For

instance, an
integer execution unit (IU)
is an ALU that executes integer arithmetic and logical instructions, a
floating-point execution unit (FPU)
is an ALU

that executes floating-point arithmetic and logical instructions, and so on.

Figure 4-5 shows that the Pentium has two IUs—a simple integer unit (SIU)

and a complex integer unit (CIU)—and a single FPU.

Execution units can be organized logically into functional blocks for

ease of reference, so the two integer execution units can be referred

Chapter 4

to collectively as the Pentium’s
integer unit
. The Pentium’s
floating-point unit
consists of only a single FPU, but some processors have more than one FPU;

likewise with the load-store unit (LSU). The floating-point unit can consist

of two FPUs—FPU1 and FPU2—and the load-store unit can consist of LSU1

and LSU2. In both cases, we’ll often refer to “the FPU” or “the LSU” when we

mean all of the execution units in that functional block, taken as a group.

Many modern microprocessors also feature vector execution units, which

perform arithmetic and logical operations on vectors. I won’t describe vector

computing in detail here, however, because that discussion belongs in another

chapter.

Memory-Access Units

In almost all of the processors that we’ll cover in later chapters, you’ll see a pair of execution units that execute memory-access instructions: the load-store unit

and the branch execution unit. The
load-store unit (LSU)
is responsible for the execution of load and store instructions, as well as for
address generation
. As mentioned in Chapter 1, LSUs have small, stripped-down integer addition

hardware that can quickly perform the addition required to compute an

address.

The
branch execution unit (BEU)
is responsible for executing conditional and unconditional branch instructions. The BEU of the DLW series reads

the processor status word as described in Chapter 1 and decides whether

or not to replace the program counter with the branch target. The BEU

also often has its own address generation unit for performing quick address

calculations as needed. We’ll talk more about the branch units of real-world

processors later on.

Microarchitecture and the ISA

In the preceding discussion of superscalar execution, I made a number of

references to the discrepancy between the linear-execution, single-ALU

programming model that the programmer sees and what the superscalar

processor’s hardware actually does. It’s now time to flesh out that distinction

between the programming model and the actual hardware by introducing

some concepts and vocabulary that will allow us to talk with more precision

about the divisions between the apparent and the actual in computer

architecture.

Chapter 1 introduced the concept of the programming model as an

abstract representation of the microprocessor that exposes to the programmer

the microprocessor’s functionality. The DLW-1’s programming model con-

sisted of a single, integer-only ALU, four general-purpose registers, a program

counter, an instruction register, a processor status word, and a control unit.

The DLW-1’s
instruction set
consisted of a few instructions for working with different parts of the programming model: arithmetic instructions (e.g., add

and sub) for the ALU and general-purpose registers (GPRs), load and store

instructions for manipulating the control unit and filling the GPRs with data,

Superscalar Execution

and branch instructions for checking the PSW and changing the PC. We can

call this programmer-centric combination of programming model and

instruction set an
instruction set architecture (ISA)
.

The DLW-1’s ISA was a straightforward reflection of its hardware, which

consisted of a single ALU, four GPRs, a PC, a PSW, and a control unit. In

contrast, the successor to the DLW-1, the DLW-2, contained a second ALU

that was invisible to the programmer and accessible only to the DLW-2’s

decode/dispatch logic. The DLW-2’s decode/dispatch logic would examine

pairs of integer arithmetic instructions to determine if they could safely be

executed in parallel (and hence out of sequential program order). If they

could, it would send them off to the two integer ALUs to be executed simul-

taneously. Now, the DLW-2 has the same instruction set architecture as the

DLW-1—the instruction set and programming model remain unchanged—

but the DLW-2’s
hardware implementation
of that ISA is significantly different in that the DLW-2 is superscalar.

A particular processor’s hardware implementation of an ISA is generally

referred to as that processor’s
microarchitecture
. We might call the ISA introduced with the DLW-1 the
DLW ISA
. Each successive iteration of our hypo-

thetical DLW line of computers—the DLW-1 and DLW-2—implements the

DLW ISA using a different microarchitecture. The DLW-1 has only one ALU,

while the DLW-2 is a two-way superscalar implementation of the DLW-ISA.

Intel’s
x
86 hardware followed the same sort of evolution, with each

successive generation becoming more complex while the ISA stayed largely

unchanged. Regarding the Pentium’s inclusion of floating-point hardware,

you might be wondering how the programmer was able to use the floating-

point hardware (i.e., the FPU plus a floating-point register file) if the original
x
86 ISA didn’t include any floating-point operations or specify any floating-point registers. The Pentium’s designers had to make the following changes

to the ISA to accommodate the new functionality:

First, they had to modify the programming model by adding an FPU and

floating-point–specific registers.

Second, they had to extend the instruction set by adding a new group of

floating-point arithmetic instructions.

These types of
ISA extensions
are fairly common in the computing world.

Intel extended the original
x
86 instruction set to include the
x
87 floating-point extensions. The
x
87 included an FPU and a stack-based floating-point register file, but we’ll talk in more detail about the
x
87’s stack-based architecture in the next chapter. Intel later extended
x
86 again with the introduction of a vector-processing instruction set called
MMX (multimedia extensions)
, and again with the introduction of the
SSE (streaming SIMD extensions)
and SSE2 instruction sets. (
SIMD
stands for
single instruction, multiple data
and is another way of describing vector computing. We’ll cover this in more detail

in
“The Vector Execution Units” on page 168.) Si
milarly, Apple, Motorola, and IBM added a set of vector extensions to the PowerPC ISA in the form of

AltiVec, as the extensions are called by Motorola, or VMX, as they’re called

by IBM.

Chapter 4

A Brief History of the ISA

Back in the early days of computing, computer makers like IBM didn’t build

a whole line of software-compatible computer systems and aim each system

at a different price/performance point. Instead, each of a manufacturer’s

systems was like each of today’s game consoles, at least from a programmer’s

perspective—programmers wrote directly to the machine’s unique hardware,

with the result that a program written for one machine would run neither on

competing machines nor on other machines from a different product line

put out by the manufacturer’s own company. Just like a Nintendo 64 will run

neither PlayStation games nor older SNES games, programs written for one

circa-1960 machine wouldn’t run on any machine but that one particular

product from that one particular manufacturer. The programming model

was different for each machine, and the code was fitted directly to the hard-

ware like a key fits a lock (see Figure 4-6).

Software

Hardware

Figure 4-6: Software was custom-fitted

to each generation of hardware

The problems this situation posed are obvious. Every time a new machine

came out, software developers had to start from scratch. You couldn’t reuse

programs, and programmers had to learn the intricacies of each new piece

of hardware in order to code for it. This cost quite a bit of time and money,

making software development a very expensive undertaking. This situation

presented computer system designers with the following problem: How do

you
expose
(make available) the functionality of a range of related hardware systems in a way that allows software to be easily developed for and ported

between those systems? IBM solved this problem in the 1960s with the launch

of the IBM System/360, which ushered in the era of modern computer

architecture. The System/360 introduced the concept of the ISA as a layer

of abstraction—or an interface, if you will—separated from a particular

processor’s microarchitecture (see Figure 4-7). This means that the infor-

mation the programmer needed to know to program the machine was

abstracted from the actual hardware implementation of that machine.

Once the design and specification of the instruction set, or the set of

instructions available to a programmer for writing programs, was separated

from the low-level details of a particular machine’s design, programs written

for a particular ISA could run on any machine that implemented that ISA.

Thus the ISA provided a standardized way to expose the features of a

system’s hardware that allowed manufacturers to innovate and fine-tune that

hardware for performance without worrying about breaking the existing

software base. You could release a first-generation product with a particular

Superscalar Execution

ISA, and then work on speeding up the implementation of that same ISA for

the second-generation product, which would be backward-compatible with

the first generation. We take all this for granted now, but before the IBM

System/360, binary compatibility between different machines of different

generations didn’t exist.

Software

Instruction Set

Architecture

1st-Generation Hardware

2nd-Generation Hardware

Figure 4-7: The ISA sits between the software and the hardware, providing a

consistent interface to the software across hardware generations.

The blue layer in Figure 4-7 simply represents the ISA as an abstract

model of a machine for which a programmer writes programs. As mentioned

earlier, the technical innovation that made this abstract layer possible was

something called the microcode engine. A
microcode engine
is sort of like a CPU within a CPU. It consists of a tiny bit of storage, the
microcode ROM
, which holds
microcode programs
, and an execution unit that executes those programs. The job of each of these microcode programs is to translate a

particular instruction into a series of commands that controls the internal

parts of the chip. When a System/360 instruction is executed, the microcode

unit reads the instruction in, accesses the portion of the microcode ROM

where that instruction’s corresponding microcode program is located, and

then produces a sequence of
machine instructions
, in the processor’s internal instruction format, that orchestrates the dance of memory accesses and functional unit activations that actually does the number crunching (or whatever

else) the architectural instruction has commanded the machine to do.

By decoding instructions this way, all programs are effectively running

in
emulation
. This means that the ISA represents a sort of idealized model, emulated by the underlying hardware, on the basis of which programmers

can design applications. This emulation means that between iterations of a

product line, a vendor can change the way their CPU executes a program,

and all they have to do is rewrite the microcode program each time so the

programmer will never have to be aware of the hardware differences because

the ISA hasn’t changed a bit. Microcode engines still show up in modern

CPUs. AMD’s Athlon processor uses one for the part of its decoding path that