The Elements of Computing Systems: Building a Modern Computer from First Principles (22 page)

Read The Elements of Computing Systems: Building a Modern Computer from First Principles Online

Authors: Noam Nisan,Shimon Schocken

BOOK: The Elements of Computing Systems: Building a Modern Computer from First Principles

11.55Mb size Format: txt, pdf, ePub

■ (Symbol): This pseudo-command binds the Symbol to the memory location into which the next command in the program will be stored. It is called “pseudocommand” since it generates no machine code.

(The remaining conventions in this section pertain to assembly programs only.)

Constants and Symbols
Constants
must be non-negative and are written in decimal notation. A user-defined symbol can be any sequence of letters, digits, underscore (_), dot (.), dollar sign ($), and colon (:) that does not begin with a digit.

Comments
Text beginning with two slashes (//) and ending at the end of the line is considered a comment and is ignored.

White Space
Space characters are ignored. Empty lines are ignored.

Case Conventions
All the assembly mnemonics must be written in uppercase. The rest (user-defined labels and variable names) is case sensitive. The convention is to use uppercase for labels and lowercase for variable names.

6.2.2 Instructions

The Hack machine language consists of two instruction types called addressing instruction (
A
-instruction) and compute instruction (
C
-instruction). The instruction format is as follows.

The translation of each of the three fields comp, dest, jump to their binary forms is specified in the following three tables.

6.2.3 Symbols

Hack assembly commands can refer to memory locations (addresses) using either constants or symbols. Symbols in assembly programs arise from three sources.

Predefined Symbols
Any Hack assembly program is allowed to use the following predefined symbols.

Note that each one of the top five RAM locations can be referred to using two predefined symbols. For example, either R2 or ARG can be used to refer to RAM[2].

Label Symbols
The pseudo-command (Xxx) defines the symbol Xxx to refer to the instruction memory location holding the next command in the program. A label can be defined only once and can be used anywhere in the assembly program, even before the line in which it is defined.

Variable Symbols
Any symbol Xxx appearing in an assembly program that is not predefined and is not defined elsewhere using the (Xxx) command is treated as a variable. Variables are mapped to consecutive memory locations as they are first encountered, starting at RAM address 16 (0x0010).

6.2.4 Example

Chapter 4 presented a program that sums up the integers 1 to 100. Figure 6.2 repeats this example, showing both its assembly and binary versions.

Figure 6.2
Assembly and binary representations of the same program.

6.3 Implementation

The Hack assembler reads as input a text file named Prog.asm, containing a Hack assembly program, and produces as output a text file named Prog.hack, containing the translated Hack machine code. The name of the input file is supplied to the assembler as a command line argument:

The translation of each individual assembly command to its equivalent binary instruction is direct and one-to-one. Each command is translated separately. In particular, each mnemonic component (field) of the assembly command is translated into its corresponding bit code according to the tables in section 6.2.2, and each symbol in the command is resolved to its numeric address as specified in section 6.2.3.

We propose an assembler implementation based on four modules: a Parser module that parses the input, a
Code
module that provides the binary codes of all the assembly mnemonics, a
SymbolTable
module that handles symbols, and a main program that drives the entire translation process.

A Note about API Notation
The assembler development is the first in a series of five software construction projects that build our hierarchy of translators (
assembler
,
virtual machine, and compiler
). Since readers can develop these projects in the programming language of their choice, we base our proposed implementation guidelines on language independent APIs. A typical project API describes several modules, each containing one or more routines. In object-oriented languages like Java, C++, and C#, a module usually corresponds to a class, and a routine usually corresponds to a method. In procedural languages, routines correspond to functions, subroutines, or procedures, and modules correspond to collections of routines that handle related data. In some languages (e.g., Modula-2) a module may be expressed explicitly, in others implicitly (e.g., a
file
in the C language), and in others (e.g., Pascal) it will have no corresponding language construct, and will just be a conceptual grouping of routines.

6.3.1 The Parser Module

The main function of the parser is to break each assembly command into its underlying components (fields and symbols). The API is as follows.

Parser:
Encapsulates access to the input code. Reads an assembly language command, parses it, and provides convenient access to the command’s components (fields and symbols). In addition, removes all white space and comments.

6.3.2 The
Code
Module

Other books

Hiding in the Shadows by Kay Hooper

Basilisk by Graham Masterton

Critical Diagnosis by Alison Stone

Towards a Dark Horizon by Maureen Reynolds

Gang Leader for a Day by Sudhir Venkatesh

The Good Thief’s Guide to Amsterdam by Chris Ewan

Unwelcome Reunions (His Dirty Secret Book 3) by Evelyn Troy

Tough Luck Hero by Maisey Yates

An Almost Perfect Murder by Gary C. King

Chained Cargo by Lesley Owen