Figure 11.6
Code generation example focusing on the translation of the statement let balance = (balance + sum) - commission (sum * 5).
11.3 Implementation
We now turn to propose a software architecture for the overall compiler. This architecture builds upon the syntax analyzer described in chapter 10. In fact, the current architecture is based on gradually evolving the syntax analyzer into a full-scale compiler. The overall compiler can thus be constructed using five modules:
• JackCompiler: top-level driver that sets up and invokes the other modules;
• JackTokenizer: tokenizer;
• SymbolTable: symbol table;
• VMWriter: output module for generating VM code;
• CompilationEngine: recursive top-down compilation engine.
11.3.1 The
JackCompiler
Module
The compiler operates on a given source, where source is either a file name of the form Xxx.jack or a directory name containing one or more such files. For each Xxx . jack input file, the compiler creates a JackTokenizer and an output Xxx.vm file. Next, the compiler uses the
CompilationEngine,
SymbolTable, and VMWriter modules to write the output file.
11.3.2 The
JackTokenizer
Module
The tokenizer API was given in section 10.3.2.
11.3.3 The
SymbolTable
Module
This module provides services for creating and using a symbol table. Recall that each symbol has a scope from which it is visible in the source code. The symbol table implements this abstraction by giving each symbol a running number (index) within the scope. The index starts at 0, increments by 1 each time an identifier is added to the table, and resets to 0 when starting a new scope. The following kinds of identifiers may appear in the symbol table:
Static:
Scope: class.
Field:
Scope: class.
Argument: Scope:
subroutine (method/function/constructor).
Var:
Scope: subroutine (method/function/constructor).
When compiling error-free Jack code, any identifier not found in the symbol table may be assumed to be a subroutine name or a class name. Since the Jack language syntax rules suffice for distinguishing between these two possibilities, and since no “linking” needs to be done by the compiler, there is no need to keep these identifiers in the symbol table.
SymbolTable:
Provides a symbol table abstraction. The symbol table associates the identifier names found in the program with identifier properties needed for compilation: type, kind, and running index. The symbol table for Jack programs has two nested scopes (class/subroutine).
Implementation Tip
The symbol table abstraction and API can be implemented using two separate hash tables: one for the class scope and another one for the subroutine scope. When a new subroutine is started, the subroutine scope table can be cleared.
11.3.4 The
VMWriter
Module
VMWriter:
Emits VM commands into a file, using the VM command syntax.
11.3.5 The
CompilationEngine
Module
This class does the compilation itself. It reads its input from a JackTokenizer and writes its output into a VMWriter. It is organized as a series of compilexxx ( ) routines, where xxx is a syntactic element of the Jack language. The contract between these routines is that each compilexxx ( ) routine should read the syntactic construct xxx from the input, advance ( ) the tokenizer exactly beyond xxx, and emit to the output VM code effecting the semantics of xxx. Thus compilexxx ( ) may only be called if indeed xxx is the next syntactic element of the input. If xxx is a part of an expression and thus has a value, the emitted code should compute this value and leave it at the top of the VM stack.
The API of this module is identical to that of the syntax analyzer’s
compilation-Engine
module from chapter 10, and thus we suggest gradually morphing the syntax analyzer into a full compiler. Section 11.5 provides step-by-step instructions and test programs for this construction.
11.4 Perspective
The fact that Jack is a relatively simple language permitted us to sidestep several thorny compilation issues. For example, while Jack looks like a typed language, this is hardly the case. All of Jack’s data types are 16-bits long, and the language semantics allows Jack compilers to ignore almost all type information. As a result, when compiling and evaluating expressions, Jack compilers need not determine their types (with the single exception that compiling a method call x.m() requires determining the class type of x). Likewise, array entries in Jack are not typed. In contrast, most programming languages feature rich type systems that have significant implications on their compilers: Different amounts of memory must be allocated for different types of variables; conversion from one type into another requires specific language operations; the compilation of a simple expression like x+y depends strongly on the types of x and y; and so on.
Another significant simplification is that the Jack language does not support inheritance. This implies that all method calls can be handled statically, at compile-time. In contrast, compilers of languages with inheritance must treat methods as virtual, and determine their locations according to the run-time type of the underlying object. For example, consider the method call x.m ( ). If the language supports inheritance, x can be derived from more than one class, and we cannot know which until run-time. Thus, if the definition of the method m is not found in the class from which x was derived, it may still be found in a class that supersedes it, and so on.
Another common feature of object-oriented languages not supported by Jack is public class fields. For example, if circ is an object of type Circle with a property radius, one cannot write statements like r=circ . radius. Instead, the programmer must equip the Circle class with accessor methods, allowing only statements like r=circ . getRadius ( ) (which is good programming practice anyway).
The lack of real typing, inheritance, and public class fields allows a truly independent compilation of classes. In particular, a Jack class can be compiled without accessing the code of any other class: The fields of other classes are never referred to directly, and all linking to methods of other classes is “late” and done just by name.
Many other simplifications of the Jack language are not significant and can be relaxed with little effort. For example, one may easily extend the language with for and switch statements. Likewise, one can add the capability to assign constants like ʹcʹ to char type variables, which is presently not supported by the language. (To assign the constant ’c’ to a Jack char variable x, one must first assign “c” to a String variable, say s, and then use let x=s.charAt(0). Clearly, it would be nicer to simply say let x=’c’, as in Java).
Finally, as usual, we did not pay any attention to optimization. Consider the high-level statement c++. A naïve compiler may translate it into the series of low-level VM operations push c, push 1, add, pop c. Next, the VM implementation will translate each one of these VM commands into several machine-level instructions, resulting in a considerable chunk of code. At the same time, an optimized compiler will notice that we are dealing with nothing more than a simple increment, and translate it into, say, the two machine instructions @c followed by M=M+1 on the Hack platform. Of course this is just one example of the finesse expected from industrial-strength compilers. Therefore, time and space efficiency play an important role in the code generation part of compilers and compilation courses.
11.5 Project
Objective
Extend the syntax analyzer built in chapter 10 into a full-scale Jack compiler. In particular, gradually replace the software modules that generate passive XML code with software modules that generate executable VM code.
Resources
The main tool that you need is the programming language in which you will implement the compiler. You will also need an executable copy of the Jack operating system, as explained below. Finally, you will need the supplied VM Emulator, to test the code generated by your compiler on a set of test programs supplied by us.
Contract
Complete the Jack compiler implementation. The output of the compiler should be VM code designed to run on the virtual machine built in the projects in chapters 7 and 8. Use your compiler to compile all the Jack programs given here. Make sure that each translated program executes according to its documentation.
Stage 1: Symbol Table
We suggest that you start by building the compiler’s symbol table module and using it to extend the syntax analyzer built in Project 10. Presently, whenever an identifier is encountered in the program, say foo, the syntax analyzer outputs the XML line foo . Instead, have your analyzer output the following information as part of its XML output (using some format of your choice):
■ the identifier category (var, argument, static, field, class, subroutine);
■ whether the identifier is presently being defined (e.g., the identifier stands for a variable declared in a var statement) or used (e.g., the identifier stands for a variable in an expression);
■ whether the identifier represents a variable of one of the four kinds (var, argument, static, field), and the running index assigned to the identifier by the symbol table.
You may test your symbol table module and the preceding capability by running your (extended) syntax analyzer on the test Jack programs supplied in Project 10. Once the output of your extended syntax analyzer includes this information, it means that you have developed a complete executable capability to understand the semantics of Jack programs. At this stage you can make the switch to a full-scale compiler and start generating VM code instead of XML output. This can be done by gradually morphing the code of the extended syntax analyzer into a full compiler.
Stage 2: Code Generation