Compare commits

...

10 Commits

Author SHA1 Message Date
Simon From Jakobsen
bb0b3ddcc3 Add to chapter 6 2024-10-25 11:04:12 +00:00
51e7bb9401 Add chapter 6 2024-10-25 02:21:29 +02:00
e94f43ab27 Fixes in chapter 1 2024-10-24 23:48:57 +02:00
Simon From Jakobsen
6f7d023408 Fixes in code 2024-10-22 11:23:39 +00:00
Simon From Jakobsen
80c6d14249 Fixes in code 2024-10-22 10:17:30 +00:00
Simon From Jakobsen
cf28a22ddf Actually add chapter 5 2024-10-22 07:49:10 +00:00
112b3b19c8 Fix code in chapters 2024-10-22 01:38:07 +02:00
79f862f941 Fixes in chapter 4 2024-10-21 23:48:34 +02:00
24c26fd8b0 Fixes in chapter 2 2024-10-21 23:48:30 +02:00
e04d421f70 Fixes in chapter 3 2024-10-21 22:37:45 +02:00
6 changed files with 520 additions and 200 deletions

View File

@ -170,18 +170,24 @@ Using some if-statement and loops, this implementation goes through each `i` cha
A visualization of how this code would lex the expression `+ 12 34` could look like this:
```
text i state tokens
text i state tokens
+ 12 0 make Plus []
+ 12 34 0 make Plus []
^
+ 12 1 skip whitespace [Plus]
+ 12 34 1 skip whitespace [Plus]
^
+ 12 2 make Int [Plus]
+ 12 34 2 make Int [Plus]
^
+ 12 3 make Int [Plus]
+ 12 34 3 make Int [Plus]
^
+ 12 4 done [Plus Int(12)]
+ 12 34 4 skip whitespace [Plus Int(12)]
^
+ 12 34 5 make Int [Plus Int(12)]
^
+ 12 34 6 make Int [Plus Int(12)]
^
+ 12 34 7 done [Plus Int(12) Int(34)]
^
```
#### Exercises

View File

@ -68,7 +68,7 @@ I'll add 3 functions for iterating through characters of the text:
class Lexer {
// ...
private step() { /*...*/ }
private done(): bool { return this.index >= this.text.length; }
private done(): boolean { return this.index >= this.text.length; }
private current(): string { return this.text[this.index]; }
// ...
}
@ -127,11 +127,8 @@ And a method for creating valueless tokens:
class Lexer {
// ...
private token(type: string, pos: Pos): Token {
return {
index: this.index,
line: this.line,
col: this.col,
};
const length = this.index - pos.index;
return { type, pos, length };
}
// ...
}
@ -142,11 +139,11 @@ And a method for testing/matching the `.current()` against a regex pattern or st
```ts
class Lexer {
// ...
private test(pattern: RegExp | string): Token {
private test(pattern: RegExp | string): boolean {
if (typeof pattern === "string")
return this.current === pattern;
return this.current() === pattern;
else
return pattern.test(this.current);
return pattern.test(this.current());
}
// ...
}
@ -164,7 +161,7 @@ class Lexer {
// ...
console.error(`Lexer: illegal character '${this.current()}' at ${pos.line}:${pos.col}`);
this.step();
return next();
return this.next();
}
// ...
}
@ -182,21 +179,13 @@ We don't need to know anything about whitespace, so we'll skip over it without m
class Lexer {
// ...
public next(): Token | null {
if (this.done())
return null;
const pos = this.pos();
// ...
if (this.test(/[ \t\n]/)) {
while (!this.done() && this.test(/[ \t\n]/))
this.step();
return next();
return this.next();
}
// ...
console.error(
`Lexer: illegal character '${this.current()}'`
+ ` at ${pos.line}:${pos.col}`,
);
this.step();
return next();
}
// ...
}
@ -216,7 +205,7 @@ class Lexer {
if (this.test("#")) {
while (!this.done() && !this.test("\n"))
this.step();
return next();
return this.next();
}
// ...
}
@ -408,7 +397,8 @@ const text = `
const lexer = new Lexer(text);
let token = lexer.next();
while (token !== null) {
console.log(`Lexed ${token}`);
const value = token.identValue ?? token.intValue ?? token.stringValue ?? "";
console.log(`Lexed ${token}(${value})`);
token = lexer.next();
}
```

View File

@ -1,7 +1,7 @@
# 3 Parser
In this chaper I'll show how I would make a parser.
In this chapter I'll show how I would make a parser.
A parser, in addition to our lexer, transforms the input program as text, meaning an unstructured sequence of characters, into a structered representation. Structured meaning the representation tells us about the different constructs such as if statements and expressions.
@ -9,7 +9,7 @@ A parser, in addition to our lexer, transforms the input program as text, meanin
The result of parsing is a tree structure representing the input program.
This structure is a recursive acyclic structure storing the different parts of the program.
This structure is a recursive structure storing the different parts of the program.
This is how I would define an AST data type.
@ -23,7 +23,7 @@ type Stmt = {
type StmtKind =
| { type: "error" }
// ...
| { type: "let", ident: string, value: Expr }
| { type: "return", expr?: Expr }
// ...
;
@ -62,7 +62,7 @@ class Parser {
}
// ...
private step() { this.currentToken = this.lexer.next() }
private done(): bool { return this.currentToken == null; }
private done(): boolean { return this.currentToken == null; }
private current(): Token { return this.currentToken!; }
// ...
}
@ -95,14 +95,14 @@ class Parser {
}
```
The parser does not need to keep track of `index`, `line` and `col` as those are stored in the tokens. The token's position is prefered to the lexer's.
The parser does not need to keep track of `index`, `line` and `col` as those are stored in the tokens. The token's position is preferred to the lexer's.
Also like the lexer, we'll have a `.test()` method in the parser, which will test for token type rather than strings or regex.
```ts
class Parser {
// ...
private test(type: string): bool {
private test(type: string): boolean {
return !this.done() && this.current().type === type;
}
// ...
@ -151,7 +151,7 @@ class Parser {
## 3.3 Operands
Operands are the individual parts of an operation. For example, in the math expression `a + b`, (would be `+ a b` in the input language), `a` and `b` are the *operands*, while `+` is the *operator*. In the expression `a + b * c`, the operands are `a`, `b` and `c`. But in the expression `a * (b + c)`, the operands of the multiply operation are `a` and `(b + c)`. `(b + c)` is an operands, because it is enclosed on both sides. This is how we'll define operands.
Operands are the individual parts of an operation. For example, in the math expression `a + b`, (would be `+ a b` in the input language), `a` and `b` are the *operands*, while `+` is the *operator*. In the expression `a + b * c`, the operands are `a`, `b` and `c`. But in the expression `a * (b + c)`, the operands of the multiply operation are `a` and `(b + c)`. `(b + c)` is a singular operand, because it is enclosed on both sides. This is how we'll define operands.
We'll make a public method in `Parser` called `parseOperand`.
@ -189,17 +189,17 @@ class Parser {
public parseOperand(): Expr {
// ...
if (this.test("ident")) {
const value = this.current().identValue;
const value = this.current().identValue!;
this.step();
return this.expr({ type: "ident", value }, pos);
}
if (this.test("int")) {
const value = this.current().intValue;
const value = this.current().intValue!;
this.step();
return this.expr({ type: "int", value }, pos);
}
if (this.test("string")) {
const value = this.current().stringValue;
const value = this.current().stringValue!;
this.step();
return this.expr({ type: "string", value }, pos);
}
@ -327,7 +327,7 @@ class Parser {
this.report("expected ident");
return this.expr({ type: "error" }, pos);
}
const value = this.current().identValue;
const value = this.current().identValue!;
this.step();
subject = this.expr({ type: "field", subject, value }, pos);
continue;
@ -365,7 +365,7 @@ class Parser {
if (this.test("[")) {
this.step();
const value = this.parseExpr();
if (!this.test("]") {
if (!this.test("]")) {
this.report("expected ']'");
return this.expr({ type: "error" }, pos);
}
@ -405,7 +405,7 @@ class Parser {
if (this.test("(")) {
this.step();
let args: Expr[] = [];
if (!this.test(")") {
if (!this.test(")")) {
args.push(this.parseExpr());
while (this.test(",")) {
this.step();
@ -414,7 +414,6 @@ class Parser {
args.push(this.parseExpr());
}
}
const value = this.parseExpr();
if (!this.test(")") {
this.report("expected ')'");
return this.expr({ type: "error" }, pos);
@ -431,10 +430,10 @@ class Parser {
}
```
Similarly to index epxressions, if we find a `(`-token, we step over it, parse the arguments, check for a `)` and replace `subject` with a call expression containing the previous `subject`.
Similarly to index expressions, if we find a `(`-token, we step over it, parse the arguments, check for a `)` and replace `subject` with a call expression containing the previous `subject`.
When parsing the arguments, we start by testing if we've reached a `)` to check if there are any arguments. If not, we parse the first argument.
The consecutive arguments are all preceded by a `,`-token. There we test or `,`, to check if we should keep parsing arguments. After checking for a seperating `,`, we check if we've reached a `)` and break if so. This is to allow for trailing comma.
The consecutive arguments are all preceded by a `,`-token. There we test for `,`, to check if we should keep parsing arguments. After checking for a seperating `,`, we check if we've reached a `)` and break if so. This is to allow for trailing comma.
```ts
func(
@ -445,7 +444,7 @@ func(
## 3.5 Prefix expressions
Contrasting postfix expressions, prefix expression are operations where the operator comes first, then the operands are listed. In some languages, operations such as negation (eg. `-value`) and not-operations (eg. `!value`) are prefix operations. In the language we're making, all binary and unary arithmetic operations are prefix. This includes both expressions with a single operand, such as not (eg. `not value`), but also expressions with 2 operands, such ass addition (eg. `+ a b`) and equation (eg. `== a b`).
Contrasting postfix expressions, prefix expression are operations where the operator comes first, then the operands are listed. In some languages, operations such as negation (eg. `-value`) and not-operations (eg. `!value`) are prefix operations. In the language we're making, all binary and unary arithmetic operations are prefix. This includes both expressions with a single operand, such as not (eg. `not value`), but also expressions with 2 operands, such as addition (eg. `+ a b`) and equation (eg. `== a b`).
This is because infix operators (eg. `a + b`) makes parsing more complicated, as it requires reasoning about operator precedence, eg. why `2 + 3 * 4 != (2 + 3) * 4`.
@ -642,7 +641,7 @@ class Parser {
public parseBreak(): Stmt {
const pos = this.pos();
this.step();
if (!this.test(";")) {
if (this.test(";")) {
return this.stmt({ type: "break" }, pos);
}
const expr = this.parseExpr();
@ -672,7 +671,7 @@ class Parser {
public parseReturn(): Stmt {
const pos = this.pos();
this.step();
if (!this.test(";")) {
if (this.test(";")) {
return this.stmt({ type: "return" }, pos);
}
const expr = this.parseExpr();
@ -715,27 +714,25 @@ class Parser {
this.report("expected ident");
return this.stmt({ type: "error" }, pos);
}
const ident = this.current().identValue;
const ident = this.current().identValue!;
this.step();
if (!this.test("(")) {
this.report("expected '('");
return this.stmt({ type: "error" }, pos);
}
const params = this.parseFnParams();
if (!params.ok)
return this.stmt({ type: "error" }, pos);
if (!this.test("{")) {
this.report("expected block");
return this.stmt({ type: "error" }, pos);
}
const body = this.parseBlock();
return this.stmt({ type: "fn", ident, params: params.value, body }, pos);
return this.stmt({ type: "fn", ident, params, body }, pos);
}
// ...
}
```
We first step over the initial `fn`-token. Then we grap the value of an `ident`-token. Then we check for a `(` and call `.parseFnParams()` to parse the parameters, including the encapsulating `(` and `)`. Then we check for and parse a block. And then we return the statement.
We first step over the initial `fn`-token. Then we grab the value of an `ident`-token. Then we check for a `(` and call `.parseFnParams()` to parse the parameters, including the encapsulating `(` and `)`. Then we check for and parse a block. And then we return the statement.
Then we define the `.parseFnParams()` method.
@ -783,7 +780,7 @@ class Parser {
public parseParam(): { ok: true, value: Param } | { ok: false } {
const pos = this.pos();
if (this.test("ident")) {
const ident = self.current().value;
const ident = this.current().identValue!;
this.step();
return { ok: true, value: { ident, pos } };
}
@ -830,7 +827,7 @@ class Parser {
}
```
We step over the first `let`-token. Then we parse a parameter using the `.parseParam()` method. If it fails, we return an error statement. Then we check for and step over a `=`-token. We then parse an expressions. And lastly return a let statement with the `ident` and `value`.
We step over the first `let`-token. Then we parse a parameter using the `.parseParam()` method. If it fails, we return an error statement. Then we check for and step over a `=`-token. We then parse an expression. And lastly return a let statement with the `ident` and `value`.
## 3.14 Assignment and expression statements
@ -915,6 +912,7 @@ class Parser {
// ...
while (!this.done()) {
if (this.test("}")) {
this.step();
return this.expr({ type: "block", stmts }, pos);
// ...
}
@ -966,6 +964,7 @@ class Parser {
}
// ...
private parseSingleLineBlockStmt(): Stmt {
const pos = this.pos();
if (this.test("let"))
return this.parseLet();
if (this.test("return"))
@ -1011,6 +1010,7 @@ class Parser {
}
// ...
private parseMultiLineBlockExpr(): Expr {
const pos = this.pos();
if (this.test("{"))
return this.parseBlock();
if (this.test("if"))
@ -1042,8 +1042,10 @@ class Parser {
this.eatSemicolon();
stmts.push(this.stmt({ type: "assign", subject: expr, value }, pos));
} else if (this.test(";")) {
this.step();
stmts.push(this.stmt({ type: "expr", expr }, expr.pos));
} else if (this.test("}")) {
this.step();
return this.expr({ type: "block", stmts, expr }, pos);
} else {
this.report("expected ';' or '}'");
@ -1118,7 +1120,7 @@ class Parser {
}
```
Then we test, if we've reached a single line statement, meaning it should end with a `;`, ishc as let, return and break.
Then we test, if we've reached a single line statement, meaning it should end with a `;`, such as let, return and break.
```ts
class Parser {
@ -1129,7 +1131,7 @@ class Parser {
if (this.test("fn")) {
// ...
} else if (this.test("{") || this.test("if") || this.test("loop")) {
let expr = this.parseMultiLineBlockExpr();
const expr = this.parseMultiLineBlockExpr();
stmts.push(this.stmt({ type: "expr", expr }, expr.pos));
// ...
}
@ -1152,6 +1154,7 @@ class Parser {
// ...
} else {
stmts.push(this.parseAssign());
this.eatSemicolon();
}
}
return stmts;
@ -1162,7 +1165,7 @@ class Parser {
If none of the above, we parse an assignment statement, which will parse an assignment statement or an expression statement.
## 3 Exercises
## Exercises
1. Implement boolean literals: `true` and `false` and null literal: `null`.
2. Implement the binary operators: `-`, `*`, `/`, `!=`, `<`, `>`, `<=`, `>=`, `or` and `and`.

View File

@ -85,12 +85,12 @@ function valueToString(value: Value): string {
return value.value ? "true" : "false";
}
if (value.type === "array") {
const valueStrings = result.values
const valueStrings = value.values
.map(value => value.toString());
return `[${valueStrings.join(", ")}]`;
}
if (value.type === "struct") {
const fieldStrings = Object.entries(result.fields)
const fieldStrings = Object.entries(value.fields)
.map(([key, value]) => `${key}: ${valueToString(value)}`);
return `struct { ${fieldStrings.join(", ")} }`;
}
@ -137,12 +137,12 @@ type SymMap = { [ident: string]: Sym }
class Syms {
private syms: SymMap = {};
public constructor(private parent?: SymMap) {}
public constructor(private parent?: Syms) {}
// ...
}
```
The `Sym` structure represents a symbol, and contains it's details such as the value and the position where the symbol is declared. The `SymMap` type is a key value map, which maps identifiers to their definition. To keep track of symbols in regard to scopes, we also define a `Syms` class. An instance of `Syms` is a node in a tree structure.
The `Sym` structure represents a symbol, and contains its details such as the value and the position where the symbol is declared. The `SymMap` type is a key value map, which maps identifiers to their definition. To keep track of symbols in regard to scopes, we also define a `Syms` class. An instance of `Syms` is a node in a tree structure.
We'll define a method for defining symbols.
@ -184,11 +184,11 @@ class Syms {
}
```
If the symbol is defined locally, return the symbol. Else if a the parent node is defined, defer to the parent. Otherwise, return a not-found result.
If the symbol is defined locally, return the symbol. Else if the parent node is defined, defer to the parent. Otherwise, return a not-found result.
## 4.3 Control flow
Most code will run with unbroken control flow, but some code will 'break' control flow. This is the case for return statements in functions and break statements in loops. To keep track of, if a return or break statement has been run, we'll define a data structure representing the control flow action of evaluted code.
Most code will run with unbroken control flow, but some code will 'break' control flow. This is the case for return statements in functions and break statements in loops. To keep track of, if a return or break statement has been run, we'll define a data structure representing the control flow action of evaluated code.
```ts
type Flow = {
@ -204,7 +204,7 @@ The 3 implemented options for control flow is breaking in a loop, returning in a
For ease of use, we'll add some functions to create the commonly used flow types and values.
```ts
function flowWalue(value: Value): Flow {
function flowValue(value: Value): Flow {
return { type: "value", value };
}
```
@ -255,8 +255,6 @@ We'll want a *root* symbol table, which stores all the predefined symbols. We al
class Evaluator {
private root = new Syms();
// ...
public defineBuiltins() { /*...*/ }
// ...
}
```
@ -271,10 +269,11 @@ type FnDef = {
params: Param[],
body: Expr,
id: number,
syms: Syms,
};
```
The parameters are needed, so that we can verify when calling, that we call with the correct amount of arguments. The body is the AST expression to be evaluated. And an identifier, so that we can refer to the definition by it's id `fnDefId`.
The parameters are needed, so that we can verify when calling, that we call with the correct amount of arguments. The body is the AST expression to be evaluated. An identifier so that we can refer to the definition by its id `fnDefId`. And a symbol table, at the time and place of the definition, as opposed to the callers time and place.
```ts
class Evaluator {
@ -283,7 +282,7 @@ class Evaluator {
}
```
We'll also add an array of function definitions to the evaluator class. The index of a function definition will also be it's id.
We'll also add an array of function definitions to the evaluator class. The index of a function definition will also be its id.
## 4.5 Expressions
@ -292,12 +291,12 @@ Let's make a function `evalExpr` for evaluating expressions.
```ts
class Evaluator {
// ...
public evalExpr(expr: Expr): Flow {
if (expr.type === "error") {
public evalExpr(expr: Expr, syms: Syms): Flow {
if (expr.kind.type === "error") {
throw new Error("error in AST");
}
// ...
throw new Error(`unknown expr type "${expr.type}"`);
throw new Error(`unknown expr type "${expr.kind.type}"`);
}
// ...
}
@ -312,10 +311,10 @@ class Evaluator {
// ...
public evalExpr(expr: Expr, syms: Syms): Flow {
// ...
if (expr.type === "ident") {
const result = syms.get(expr.value);
if (expr.kind.type === "ident") {
const result = syms.get(expr.kind.value);
if (!result.ok)
throw new Error(`undefined symbol "${expr.value}"`);
throw new Error(`undefined symbol "${expr.kind.value}"`);
return flowValue(result.sym.value);
}
// ...
@ -331,17 +330,17 @@ class Evaluator {
// ...
public evalExpr(expr: Expr, syms: Syms): Flow {
// ...
if (expr.type === "null") {
if (expr.kind.type === "null") {
return flowValue({ type: "null" });
}
if (expr.type === "int") {
return flowValue({ type: "int", value: expr.value });
if (expr.kind.type === "int") {
return flowValue({ type: "int", value: expr.kind.value });
}
if (expr.type === "string") {
return flowValue({ type: "string", value: expr.value });
if (expr.kind.type === "string") {
return flowValue({ type: "string", value: expr.kind.value });
}
if (expr.type === "bool") {
return flowValue({ type: "int", value: expr.value });
if (expr.kind.type === "bool") {
return flowValue({ type: "bool", value: expr.kind.value });
}
// ...
}
@ -358,8 +357,8 @@ class Evaluator {
// ...
public evalExpr(expr: Expr, syms: Syms): Flow {
// ...
if (expr.type === "group") {
return this.evalExpr(expr.expr, syms);
if (expr.kind.type === "group") {
return this.evalExpr(expr.kind.expr, syms);
}
// ...
}
@ -376,15 +375,15 @@ class Evaluator {
// ...
public evalExpr(expr: Expr, syms: Syms): Flow {
// ...
if (expr.type === "field") {
const [subject, subjectFlow] = expectValue(this.evalExpr(expr.subject, syms));
if (expr.kind.type === "field") {
const [subject, subjectFlow] = expectValue(this.evalExpr(expr.kind.subject, syms));
if (!subject)
return subjectFlow;
if (subject.type !== "struct")
throw new Error(`cannot use field operator on ${subject.type} value`);
if (!(expr.value in subject.fields))
throw new Error(`field ${expr.value} does not exist on struct`);
return subject.fields[expr.value];
if (!(expr.kind.value in subject.fields))
throw new Error(`field ${expr.kind.value} does not exist on struct`);
return flowValue(subject.fields[expr.kind.value]);
}
// ...
}
@ -401,11 +400,11 @@ class Evaluator {
// ...
public evalExpr(expr: Expr, syms: Syms): Flow {
// ...
if (expr.type === "index") {
const [subject, subjectFlow] = expectValue(this.evalExpr(expr.subject, syms));
if (expr.kind.type === "index") {
const [subject, subjectFlow] = expectValue(this.evalExpr(expr.kind.subject, syms));
if (!subject)
return subjectFlow;
const [value, valueFlow] = expectValue(this.evalExpr(expr.value, syms));
const [value, valueFlow] = expectValue(this.evalExpr(expr.kind.value, syms));
if (!value)
return valueFlow;
if (subject.type === "struct") {
@ -431,11 +430,11 @@ class Evaluator {
if (subject.type === "string") {
if (value.type !== "int")
throw new Error(`cannot index into string with ${value.type} value`);
if (value.value >= subject.values.length)
if (value.value >= subject.value.length)
throw new Error("index out of range");
if (value.value < 0) {
const negativeIndex = subject.values.length + value.value;
if (negativeIndex < 0 || negativeIndex >= subject.values.length)
const negativeIndex = subject.value.length + value.value;
if (negativeIndex < 0 || negativeIndex >= subject.value.length)
throw new Error("index out of range");
return flowValue({ type: "int", value: subject.value.charCodeAt(negativeIndex) });
}
@ -451,7 +450,7 @@ class Evaluator {
The index operator can be evaluated on a subject of either struct, array or string type. If evaluated on the struct type, we expect a string containing the field name. If the field does not exist, we return a null value. This is in contrast to the field operator, which throws an error, if no field is found. If the subject is instead an array, we expect a value of type int. We check if either the int value index or negative index is in range of the array values. If so, return the value at the index or the negative index. If the subject is a string, evaluation will behave similarly to an array, evaluating to an int value representing the value of the text character at the index or negative index.
The negative index is when a negative int value is passed as index, where the index will start at the end of the array. Given an array `vs` containing the values `["a", "b", "c"]` in listed order, the indices `0`, `1` and `2` will evalute to the values `"a"`, `"b"` and `"c"`, whereas the indices `-1`, `-2`, `-3` will evaluate to the values `"c"`, `"b"` and `"a"`. A negative index implicitly starts at the length of the array and subtracts the absolute index value.
The negative index is when a negative int value is passed as index, where the index will start at the end of the array. Given an array `vs` containing the values `["a", "b", "c"]` in listed order, the indices `0`, `1` and `2` will evaluate to the values `"a"`, `"b"` and `"c"`, whereas the indices `-1`, `-2`, `-3` will evaluate to the values `"c"`, `"b"` and `"a"`. A negative index implicitly starts at the length of the array and subtracts the absolute index value.
### 4.5.6 Call expressions
@ -460,18 +459,18 @@ class Evaluator {
// ...
public evalExpr(expr: Expr, syms: Syms): Flow {
// ...
if (expr.type === "call") {
const [subject, subjectFlow] = expectValue(this.evalExpr(expr.subject, syms));
if (expr.kind.type === "call") {
const [subject, subjectFlow] = expectValue(this.evalExpr(expr.kind.subject, syms));
if (!subject)
return subjectFlow;
const args: Value[] = [];
for (const arg of expr.args) {
const [value, valueFlow] = expectValue(this.evalExpr(expr.value, syms));
for (const arg of expr.kind.args) {
const [value, valueFlow] = expectValue(this.evalExpr(arg, syms));
if (!value)
return valueFlow;
args.push(value);
}
if (subject.type === "builtin") {
if (subject.type === "builtin_fn") {
return this.executeBuiltin(subject.name, args, syms);
}
if (subject.type !== "fn")
@ -481,8 +480,8 @@ class Evaluator {
const fnDef = this.fnDefs[subject.fnDefId];
if (fnDef.params.length !== args.length)
throw new Error("incorrect amount of arguments in call to function");
let fnScopeSyms = new Syms(this.root);
for (const [i, param] in fnDef.params.entries()) {
let fnScopeSyms = new Syms(fnDef.syms);
for (const [i, param] of fnDef.params.entries()) {
fnScopeSyms.define(param.ident, { value: args[i], pos: param.pos });
}
const flow = this.evalExpr(fnDef.body, fnScopeSyms);
@ -498,20 +497,20 @@ class Evaluator {
}
```
The first thing we do is evaluate the subject expression of the call (`subject(...args)`). If that yeilds a value, we continue. Then we evaluate each of the arguments in order. If evaluation of an argument doesn't yeild a value, we return immediately. Then, if the subject evaluated to a builtin value, we call `executeBuiltin`, which we will define later, with the builtin name, call arguments and symbol sable. Otherwise, we assert that the subject value is a function and that a function definition with the id exists. We then check that the correct amount of arguments are passed. Then, we make a new symbol table with the root table as parent, which will be the called functions symbols. We assign each argument value to the corrosponding parameter name, dictated by argument order. We then evaluate the function body. Finally, we check that the control flow results in either a value, which we simply return, or a return flow, which we convert to a value.
The first thing we do is evaluate the subject expression of the call (`subject(...args)`). If that yields a value, we continue. Then we evaluate each of the arguments in order. If evaluation of an argument doesn't yield a value, we return immediately. Then, if the subject evaluated to a builtin value, we call `executeBuiltin`, which we will define later, with the builtin name, call arguments and symbol table. Otherwise, we assert that the subject value is a function and that a function definition with the id exists. We then check that the correct amount of arguments are passed. Then, we make a new symbol table with the function definition's symbol table as parent, which will be the called function's symbols. We assign each argument value to the corresponding parameter name, dictated by argument order. We then evaluate the function body. Finally, we check that the control flow results in either a value, which we simply return, or a return flow, which we convert to a value.
### 4.5.7 Unary expressions
Next, we will implement evaluation of unary expressions, meaning postfix expressions with one operand such as when using the `not` operator.
Next, we will implement evaluation of unary expressions, meaning postfix expressions with one operand, such as when using the `not` operator.
```ts
class Evaluator {
// ...
public evalExpr(expr: Expr, syms: Syms): Flow {
// ...
if (expr.type === "unary") {
if (expr.unaryType === "not") {
const [subject, subjectFlow] = expectValue(this.evalExpr(expr.subject, syms));
if (expr.kind.type === "unary") {
if (expr.kind.unaryType === "not") {
const [subject, subjectFlow] = expectValue(this.evalExpr(expr.kind.subject, syms));
if (!subject)
return subjectFlow;
if (subject.type === "bool") {
@ -519,7 +518,7 @@ class Evaluator {
}
throw new Error(`cannot apply not operator on type ${subject.type}`);
}
throw new Error(`unhandled unary operation ${expr.unaryType}`);
throw new Error(`unhandled unary operation ${expr.kind.unaryType}`);
}
// ...
}
@ -538,23 +537,23 @@ class Evaluator {
// ...
public evalExpr(expr: Expr, syms: Syms): Flow {
// ...
if (expr.type === "binary") {
const [left, leftFlow] = expectValue(this.evalExpr(expr.left, syms));
if (expr.kind.type === "binary") {
const [left, leftFlow] = expectValue(this.evalExpr(expr.kind.left, syms));
if (!left)
return leftFlow;
const [right, rightFlow] = expectValue(this.evalExpr(expr.right, syms));
const [right, rightFlow] = expectValue(this.evalExpr(expr.kind.right, syms));
if (!right)
return rightFlow;
if (expr.binaryType === "+") {
if (expr.kind.binaryType === "+") {
if (left.type === "int" && right.type === "int") {
return flowValue({ type: "int", value: left.value + right.value });
}
if (left.type === "string" && right.type === "string") {
return flowValue({ type: "string", value: left.value + right.value });
}
throw new Error(`cannot apply ${expr.binaryType} operator on types ${left.type} and ${right.type}`);
throw new Error(`cannot apply ${expr.kind.binaryType} operator on types ${left.type} and ${right.type}`);
}
if (expr.binaryType === "==") {
if (expr.kind.binaryType === "==") {
if (left.type === "null" && right.type === "null") {
return flowValue({ type: "bool", value: true });
}
@ -564,12 +563,13 @@ class Evaluator {
if (left.type !== "null" && right.type === "null") {
return flowValue({ type: "bool", value: false });
}
if (["int", "string", "bool"].includes(left.type) && left.type === right.type) {
if ((left.type === "int" || left.type === "string" || left.type === "bool")
&& left.type === right.type) {
return flowValue({ type: "bool", value: left.value === right.value });
}
throw new Error(`cannot apply ${expr.binaryType} operator on types ${left.type} and ${right.type}`);
throw new Error(`cannot apply ${expr.kind.binaryType} operator on types ${left.type} and ${right.type}`);
}
throw new Error(`unhandled binary operation ${expr.unaryType}`);
throw new Error(`unhandled binary operation ${expr.kind.binaryType}`);
}
// ...
}
@ -577,7 +577,7 @@ class Evaluator {
}
```
Add operation (`+`) is straight forward. Evaluate the left expressions, evaluate the right expressions and return a value with the result of adding left and right. Addition should work on integers and strings. Add string two strings results in a new string consisting of the left and right values concatonated.
Add operation (`+`) is straight forward. Evaluate the left expressions, evaluate the right expressions and return a value with the result of adding left and right. Addition should work on integers and strings. Adding two strings results in a new string consisting of the left and right values concatenated.
The equality operator (`==`) is a bit more complicated. It only results in values of type bool. You should be able to check if any value is null. Otherwise, comparison should only be allowed on two values of same type.
@ -594,16 +594,16 @@ class Evaluator {
// ...
public evalExpr(expr: Expr, syms: Syms): Flow {
// ...
if (expr.type === "if") {
const [condition, conditionFlow] = expectValue(this.evalExpr(expr.condition, syms));
if (!condition)
return conditionFlow;
if (condition.type !== "bool")
throw new Error(`cannot use value of type ${subject.type} as condition`);
if (condition.value)
return this.evalExpr(expr.truthy, syms);
if (expr.falsy)
return this.evalExpr(exor.falsy, syms);
if (expr.kind.type === "if") {
const [cond, condFlow] = expectValue(this.evalExpr(expr.kind.cond, syms));
if (!cond)
return condFlow;
if (cond.type !== "bool")
throw new Error(`cannot use value of type ${cond.type} as condition`);
if (cond.value)
return this.evalExpr(expr.kind.truthy, syms);
if (expr.kind.falsy)
return this.evalExpr(expr.kind.falsy, syms);
return flowValue({ type: "null" });
}
// ...
@ -616,16 +616,16 @@ We start by evaluating the condition expression. The condition value should be a
### 4.5.10 Loop expressions
Next, we'll implement the loop expression. The loop expression will repeatedly evaluate the body expression while throwing away the resulting values, until it results in breaking control flow. If the control flow is of type break, the loop expression itself will evalute to the break's value.
Next, we'll implement the loop expression. The loop expression will repeatedly evaluate the body expression while throwing away the resulting values, until it results in breaking control flow. If the control flow is of type break, the loop expression itself will evaluate to the break's value.
```ts
class Evaluator {
// ...
public evalExpr(expr: Expr, syms: Syms): Flow {
// ...
if (expr.type === "loop") {
if (expr.kind.type === "loop") {
while (true) {
const flow = this.evaluate(expr.body, syms);
const flow = this.evalExpr(expr.kind.body, syms);
if (flow.type === "break")
return flowValue(flow.value);
if (flow.type !== "value")
@ -638,7 +638,7 @@ class Evaluator {
}
```
First, start an infinite loop. In each iteration, evalute the loop body. If the resulting control flow is breaking, return the break value. If the control flow is not a value, meaning return or other unimplemented control flow, just return the control flow. Otherwise, discard the value and repeate.
First, start an infinite loop. In each iteration, evaluate the loop body. If the resulting control flow is breaking, return the break value. If the control flow is not a value, meaning return or other unimplemented control flow, just return the control flow. Otherwise, discard the value and repeate.
## 4.5.11 Block expressions
@ -650,15 +650,15 @@ class Evaluator {
// ...
public evalExpr(expr: Expr, syms: Syms): Flow {
// ...
if (expr.type === "block") {
if (expr.kind.type === "block") {
let scopeSyms = new Syms(syms);
for (const stmt of block.stmts) {
for (const stmt of expr.kind.stmts) {
const flow = this.evalStmt(stmt, scopeSyms);
if (flow.type !== "value")
return flow;
}
if (expr.expr)
return this.evalExpr(expr.expr, scopeSyms);
if (expr.kind.expr)
return this.evalExpr(expr.kind.expr, scopeSyms);
return flowValue({ type: "null" });
}
// ...
@ -682,11 +682,11 @@ For evaluating statements, we'll make a function called `evalStmt` .
class Evaluator {
// ...
public evalStmt(stmt: Stmt, syms: Syms): Flow {
if (stmt.type === "error") {
if (stmt.kind.type === "error") {
throw new Error("error in AST");
}
// ...
throw new Error(`unknown stmt type "${expr.type}"`);
throw new Error(`unknown stmt type "${stmt.kind.type}"`);
}
// ...
}
@ -703,10 +703,10 @@ class Evaluator {
// ...
public evalStmt(stmt: Stmt, syms: Syms): Flow {
// ...
if (stmt.type === "break") {
if (!stmt.expr)
return { type: "break" };
const [value, valueFlow] = expectValue(this.evalExpr(stmt.expr));
if (stmt.kind.type === "break") {
if (!stmt.kind.expr)
return { type: "break", value: { type: "null" } };
const [value, valueFlow] = expectValue(this.evalExpr(stmt.kind.expr, syms));
if (!value)
return valueFlow;
return { type: "break", value };
@ -728,10 +728,10 @@ class Evaluator {
// ...
public evalStmt(stmt: Stmt, syms: Syms): Flow {
// ...
if (stmt.type === "return") {
if (!stmt.expr)
return { type: "return" };
const [value, valueFlow] = expectValue(this.evalExpr(stmt.expr));
if (stmt.kind.type === "return") {
if (!stmt.kind.expr)
return { type: "return", value: { type: "null" } };
const [value, valueFlow] = expectValue(this.evalExpr(stmt.kind.expr, syms));
if (!value)
return valueFlow;
return { type: "return", value };
@ -753,8 +753,8 @@ class Evaluator {
// ...
public evalStmt(stmt: Stmt, syms: Syms): Flow {
// ...
if (stmt.type === "expr") {
return this.evalExpr(stmt.expr);
if (stmt.kind.type === "expr") {
return this.evalExpr(stmt.kind.expr, syms);
}
// ...
}
@ -771,41 +771,42 @@ class Evaluator {
// ...
public evalStmt(stmt: Stmt, syms: Syms): Flow {
// ...
if (stmt.type === "assign") {
const [value, valueFlow] = expectValue(this.evalExpr(stmt.value, syms));
if (stmt.kind.type === "assign") {
const [value, valueFlow] = expectValue(this.evalExpr(stmt.kind.value, syms));
if (!value)
return valueFlow;
if (stmt.subject.type === "ident") {
const ident = stmt.subject.value;
const { ok: found, sym } = syms.get(ident);
if (!found)
if (stmt.kind.subject.kind.type === "ident") {
const ident = stmt.kind.subject.kind.value;
const getResult = syms.get(ident);
if (!getResult.ok)
throw new Error(`cannot assign to undefined symbol "${ident}"`);
const { sym } = getResult;
if (sym.value.type !== value.type)
throw new Error(`cannot assign value of type ${value.type} to symbol originally declared ${sym.value.type}`);
sym.value = value;
return valueFlow({ type: "null" });
return flowValue({ type: "null" });
}
if (stmt.subject.type === "field") {
const [subject, subjectFlow] = expectValue(this.evalExpr(stmt.subject.subject, syms));
if (stmt.kind.subject.kind.type === "field") {
const [subject, subjectFlow] = expectValue(this.evalExpr(stmt.kind.subject.kind.subject, syms));
if (!subject)
return subjectFlow;
if (subject.type !== "struct")
throw new Error(`cannot use field operator on ${subject.type} value`);
subject.fields[stmt.subject.value] = value;
return valueFlow({ type: "null" });
subject.fields[stmt.kind.subject.kind.value] = value;
return flowValue({ type: "null" });
}
if (stmt.subject.type === "index") {
const [subject, subjectFlow] = expectValue(this.evalExpr(stmt.subject.subject, syms));
if (stmt.kind.subject.kind.type === "index") {
const [subject, subjectFlow] = expectValue(this.evalExpr(stmt.kind.subject.kind.subject, syms));
if (!subject)
return subjectFlow;
const [index, indexFlow] = expectValue(this.evalExpr(stmt.subject.value, syms));
const [index, indexFlow] = expectValue(this.evalExpr(stmt.kind.subject.kind.value, syms));
if (!index)
return valueFlow;
return indexFlow;
if (subject.type === "struct") {
if (index.type !== "string")
throw new Error(`cannot index into struct with ${index.type} value`);
subject.fields[index.value] = value;
return valueFlow({ type: "null" });
return flowValue({ type: "null" });
}
if (subject.type === "array") {
if (index.type !== "int")
@ -813,19 +814,19 @@ class Evaluator {
if (index.value >= subject.values.length)
throw new Error("index out of range");
if (index.value >= 0) {
subject.value[index.value] = value;
subject.values[index.value] = value;
} else {
const negativeIndex = subject.values.length + index.value;
if (negativeIndex < 0 || negativeIndex >= subject.values.length)
throw new Error("index out of range");
subject.value[negativeIndex] = value;
return valueFlow({ type: "null" });
subject.values[negativeIndex] = value;
return flowValue({ type: "null" });
}
return valueFlow({ type: "null" });
return flowValue({ type: "null" });
}
throw new Error(`cannot use field operator on ${subject.type} value`);
}
throw new Error(`cannot assign to ${stmt.subject.type} expression`);
throw new Error(`cannot assign to ${stmt.kind.subject.kind.type} expression`);
}
// ...
}
@ -839,7 +840,7 @@ For assigning to identifiers, eg. `a = 5`, we start by finding the symbol. If no
For assigning to fields, eg. `a.b = 5`, we evaluate the inner (field expression) subject expression, `a` in this case. Then we reassign the field value or assign to a new field, if it doesn't exist.
And then, for assigning to indeces, eg. `a[b] = 5`, we evalute the inner (index expression) subject `a` and index value `b` in that order. If `a` is a struct, we check that `b` is a string and assign to the field, the string names. Else, if `a` is an array, we check that `b` is an int and assign to the index or negative index (see 4.5.5 Index expressions).
And then, for assigning to indices, eg. `a[b] = 5`, we evaluate the inner (index expression) subject `a` and index value `b` in that order. If `a` is a struct, we check that `b` is a string and assign to the field, the string names. Else, if `a` is an array, we check that `b` is an int and assign to the index or negative index (see 4.5.5 Index expressions).
### 4.6.5 Let statements
@ -850,14 +851,14 @@ class Evaluator {
// ...
public evalStmt(stmt: Stmt, syms: Syms): Flow {
// ...
if (stmt.type === "let") {
if (syms.definedLocally(stmt.param.ident))
throw new Error(`cannot redeclare symbol "${stmt.param.ident}"`);
const [value, valueFlow] = expectValue(this.evalExpr(stmt.value, syms));
if (stmt.kind.type === "let") {
if (syms.definedLocally(stmt.kind.param.ident))
throw new Error(`cannot redeclare symbol "${stmt.kind.param.ident}"`);
const [value, valueFlow] = expectValue(this.evalExpr(stmt.kind.value, syms));
if (!value)
return valueFlow;
syms.define(stmt.param.ident, value);
return valueFlow({ type: "null" });
syms.define(stmt.kind.param.ident, { value: value, pos: stmt.pos });
return flowValue({ type: "null" });
}
// ...
}
@ -874,10 +875,10 @@ class Evaluator {
// ...
public evalStmt(stmt: Stmt, syms: Syms): Flow {
// ...
if (stmt.type === "fn") {
if (syms.definedLocally(stmt.ident))
throw new Error(`cannot redeclare function "${stmt.ident}"`);
const { params, body } = stmt;
if (stmt.kind.type === "fn") {
if (syms.definedLocally(stmt.kind.ident))
throw new Error(`cannot redeclare function "${stmt.kind.ident}"`);
const { params, body } = stmt.kind;
let paramNames: string[] = [];
for (const param of params) {
@ -887,9 +888,9 @@ class Evaluator {
}
const id = this.fnDefs.length;
this.fnDefs.push({ params, body, id });
this.syms.define(stmt.ident, { type: "fn", fnDefId: id });
return flowValue({ type: "none" });
this.fnDefs.push({ params, body, id, syms });
syms.define(stmt.kind.ident, { value: { type: "fn", fnDefId: id }, pos: stmt.pos });
return flowValue({ type: "null" });
}
// ...
}
@ -906,9 +907,9 @@ We'll want a function for evaluating the top-level statements.
```ts
class Evaluator {
// ...
public evalStmts(stmts: Stmt[], syms: Syms) {
let scopeSyms = new Syms(syms);
for (const stmt of block.stmts) {
public evalStmts(stmts: Stmt[]) {
let scopeSyms = new Syms(this.root);
for (const stmt of stmts) {
const flow = this.evalStmt(stmt, scopeSyms);
if (flow.type !== "value")
throw new Error(`${flow.type} on the loose!`);
@ -1063,13 +1064,13 @@ class Evaluator {
private executeBuiltin(name: string, args: Value[], syms: Syms): Flow {
// ...
if (name === "println") {
if (args.length < 1)
if (args.length < 1 || args[0].type !== "string")
throw new Error("incorrect arguments");
let msg = args[0];
let msg = args[0].value;
for (const arg of args.slice(1)) {
if (!msg.includes("{}"))
throw new Error("incorrect arguments");
msg.replace("{}", valueToString(arg));
msg = msg.replace("{}", valueToString(arg));
}
console.log(msg);
return flowValue({ type: "null" });
@ -1080,7 +1081,7 @@ class Evaluator {
}
```
This function takes a format-string as the first argument, and, corrosponding to the format-string, values to be formattet in the correct order.
This function takes a format-string as the first argument, and, corresponding to the format-string, values to be formatted in the correct order.
Examples of how to use the function follows.
@ -1095,7 +1096,7 @@ println("{} + {} = {}", 1, 2, 1 + 2);
### 4.8.5 Exit
Normally, the evaluator will return a zero-exit code, meanin no error. In case we program should result in an error code, we'll need an exit function.
Normally, the evaluator will return a zero-exit code, meaning no error. In case the program should result in an error code, we'll need an exit function.
```ts
class Evaluator {
@ -1151,7 +1152,7 @@ class Evaluator {
if (args.length !== 1 || args[0].type !== "string")
throw new Error("incorrect arguments");
const value = parseInt(args[0].value);
if (value === NaN)
if (isNaN(value))
return flowValue({ type: "null" });
return flowValue({ type: "int", value });
}
@ -1171,14 +1172,20 @@ Finally, we need a way to define the builtin functions in the symbol table.
class Evaluator {
// ...
public defineBuiltins() {
this.root.define("array", { type: "builtin_fn", name: "array" });
this.root.define("struct", { type: "builtin_fn", name: "struct" });
this.root.define("array_push", { type: "builtin_fn", name: "array_push" });
this.root.define("array_len", { type: "builtin_fn", name: "array_len" });
this.root.define("string_concat", { type: "builtin_fn", name: "string_concat" });
this.root.define("string_len", { type: "builtin_fn", name: "string_len" });
this.root.define("println", { type: "builtin_fn", name: "println" });
this.root.define("exit", { type: "builtin_fn", name: "exit" });
function defineBuiltin(syms: Syms, name: string) {
syms.define(name, { value: { type: "builtin_fn", name } });
}
defineBuiltin(this.root, "array");
defineBuiltin(this.root, "struct");
defineBuiltin(this.root, "array_push");
defineBuiltin(this.root, "array_len");
defineBuiltin(this.root, "string_concat");
defineBuiltin(this.root, "string_len");
defineBuiltin(this.root, "println");
defineBuiltin(this.root, "exit");
defineBuiltin(this.root, "to_string");
defineBuiltin(this.root, "string_to_int");
}
// ...
}

250
compiler/chapter_5.md Normal file
View File

@ -0,0 +1,250 @@
# 5 Make a test program
To test the evaluator and any future runtimes we'll remake the calculator from chapter 1.
## 5.1 Hello world
First, we'll need to setup the parser and evaluator.
I'll show this example using Deno.
```ts
const filename = Deno.args[0];
const text = await Deno.readTextFile(filename);
const parser = new Parser(new Lexer(text));
const ast = parser.parseStmts();
const evaluator = new Evaluator();
evaluator.defineBuiltins();
evaluator.evalStmts(ast);
```
Now we can write source code in a file.
```rs
println("hello world");
```
Save it to a file, eg. `program.txt`, run it with the following command.
```sh
deno run -A main.ts program.txt
```
And it should output the following.
```
hello world
```
## 5.2 Representation in code
We'll use the same representation in code as we did in chapter one. The following is how it was done in Javascript.
```js
const expr = {
type: "add",
left: { type: "int", value: 1 },
right: {
type: "multiply",
left: { type: "int", value: 2 },
right: { type: "int", value: 3 },
},
};
```
To do this is our language, (unless you've implemented struct literals), we'll need to use the `struct` builtin and field expression assignment.
```rs
let expr = struct();
expr["type"] = "add";
expr["left"] = struct();
expr.left["type"] = "int";
expr.left["value"] = 1;
expr["right"] = struct();
expr.right["type"] = "multiply";
expr.right["left"] = struct();
expr.right.left["type"] = "int";
expr.right.left["value"] = 2;
expr.right["right"] = struct();
expr.right.right["type"] = "int";
expr.right.right["value"] = 3;
```
If we print this value, we should expect an output like the following.
```
{ type: "add", left: { type: "int", value: 1 }, right: { type: "multiply", left: { type: "int", value: 2 }, right: { type: "int", value: 3 } } }
```
## 5.3 Evaluating expressions
Exactly like in chapter 1, we need a function for evaluating the expression structure described above.
```rs
fn eval_expr(node) {
if == node.type "int" {
return node.value;
}
if == node.type "add" {
let left = eval_expr(node.left);
let right = eval_expr(node.right);
return + left right;
}
if == node.type "multiply" {
let left = eval_expr(node.left);
let right = eval_expr(node.right);
return * left right;
}
println("unknown expr type");
exit(1);
}
```
### Exercises
1. Implement `subtract` and `divide`.
## 5.4 Parsing source code
### 5.4.1 Lexer
```rs
fn char_in_string(ch, val) {
let i = 0;
let val_length = string_len(val);
loop {
if >= i val_len {
break;
}
if == ch val[i] {
return true;
}
i = + i 1;
}
false
}
fn lex(text) {
let i = 0;
let tokens = array();
let text_length = array_len(text);
loop {
if >= i text_length {
break;
}
loop {
if not char_in_string(text[i], " \t\n") {
break;
}
i = + i 1;
}
if char_in_string(text[i], "1234567890") {
let text_val = "";
loop {
if not char_in_string(text[i], "1234567890") {
break;
}
text_val = string_concat(text_val, text[i]);
i = + i 1;
}
let token = struct();
token["type"] = "int";
token["value"] = string_to_int(text_val);
array_push(token);
} else if == text[i] "+"[0] {
i = + i 1;
let token = struct();
token["type"] = "+";
array_push(token);
} else if == text[i] "*"[0] {
i = + i 1;
let token = struct();
token["type"] = "*";
array_push(token);
} else {
println("illegal character");
exit(1);
}
}
tokens
}
```
#### Exercises
1. Implement `-` and `/` for subtraction and division.
2. \* Add position (line and column number) to each token.
3. \*\* Rewrite lexer into an iterator (eg. use the OOP iterator pattern).
### 5.4.2 Parser
```rs
fn parser_new(tokens) {
let self = struct();
self["tokens"] = tokens;
self["i"] = 0;
}
fn parser_step(self) { self.i = + self.i 1; }
fn parser_done(self) { >= self.i array_len(self.tokens) }
fn parser_current(self) { self.tokens[self.i] }
fn parser_parse_expr(self) {
if parser_done(self) {
println("expected expr, got end-of-file");
exit(1);
} else if == parser_current(self).type "+" {
parser_step(self);
let left = parser_parse_expr(self);
let right = parser_parse_expr(self);
let node = struct();
node["type"] = "add";
node["left"] = left;
node["right"] = right;
node
} else if == parser_current(self).type "*" {
parser_step(self);
let left = parser_parse_expr(self);
let right = parser_parse_expr(self);
let node = struct();
node["type"] = "multiply";
node["left"] = left;
node["right"] = right;
node
} else if == parser_current(self).type "int" {
let value = parser_current(self).value;
parser_step(self);
let node = struct();
node["type"] = "int";
node["value"] = value;
node
} else {
println("expected expr");
exit(1);
}
}
```
#### Exercises
1. Implement subtraction and division.
2. \* Add position (line and column number) to each expression.
## 5.5 Putting it together
```rs
let text = "+ 1 2";
let tokens = lex(text);
let parser = parser_new(tokens);
let expr = parser_parse_expr();
let result = eval_expr(expr);
println("result of {} is {}", text, result);
```
Running the code, we should expect console output like the following.
```
result of + 1 2 is 3
```
## Exercises
1. Make a performance benchmark.

64
compiler/chapter_6.md Normal file
View File

@ -0,0 +1,64 @@
# 6 Summary theory
What's the next step?
To answer that question, we'll have to understand a bit of theory, to know where we are and how we got here.
## 6.1 Parsing source code into AST
We started in chapter 2 and 3 by making a parser consisting of the `Lexer` and `Parser` classes. The parser takes source code and translates, or *parses*, it into a structured representation called AST. AST is short for __A__bstract __s__yntax __t__ree, which means that the original program, or code, is represented as a tree-structure.
We defined the *structured* of the AST (*code as tree-structure*), by consisting of the `Stmt`, `Expr`, ... types.
We converted the source code from text to AST, because AST is easier to work with in the step that followed.
## 6.2 Evaluating AST
In chapter 4 we made an evaluator consisting primarily of the `Value` type, `Syms` class and `Evaluator` class. The evaluator, or AST evaluator, takes a program represented in AST (*code as tree-structure*), goes through the tree-structure and calculates the resulting values.
Execution using the evaluater is a top-down, outside-in proceess, where we start with the root node, and then call `evalStmt` and `evalExpr` for each child node recursively. We then use the value optained by evaluating the child nodes to calculate the result for a node.
The only upside for implementing an AST evaluator, is that it is simple to implement and think about. An AST evaluator operates on an AST which is a comparatively close representation of the original source code. Humans understand programs from the point of the source code. Therefore, an AST evaluator executes the code, or *thinks about* the program, the same way we humans think about program. The human brain functions efficiently using layers of abstraction.
Take the math expression `2 * (3 + 4)`. We start by examining the entire expression. Then we split up the expression into its components, that being a multiplication of `2` by the addition of `3` and `4`. We then, to calculate the result of the outer expression, calculate the result of the inner expression: `3 + 4 = 7`. Then we use that result and calculate the multiplicat: `2 * 7 = 14`. The evaluator functions exactly this way.
There are multiple downsides to AST evaluation.
One downside is that some features of the source languages are *ugly* to implement. While expression evaluation is conceptually simple to evaluate using function calls, other features are not. Control flow related features such as break and return cannot be evaluated using only function calls. This is because function calls follow the (control) flow, while break and return *breaks* the control flow.
But the primary downside of AST evaluation is performance. While humans are most efficient when using layers of abstractions, computers are not. For various reasons, calling functions recursively repeatedly and jumping through tree-structures is very inefficient for computers. Both in terms of memory footprint and execution time. Computers are much more efficient with sequential execution.
Take the expression defined above. Now imagine we're describing the instructions for how to get the result. We would of course look at the while expression, then break it down. We could then formulate *instructions* such as *add 3 to 4, then multiply by 2*. Now if we execute the instructions, we don't start by examening the entire expression, we just execute the instructions in order. We have now translated the expression into linear execution, which computers are very good at running.
## 6.3 Instructions
The 2 main pros of using instructions counter the cons of AST evaluation.
A lesser point pertaining to implementing control flow, is that everything is done using the sequential instructions. This means special control flow such as break, are handled in the same manner as regular control flow like loops.
The primary upside compared to AST evaluation is performance. Running instructions is a sequential operation, which could for example look like an array of instructions, where a location of the current instruction is stored, and then execution loops over the array with a for loop. This is A LOT faster compared to AST evaluation with a tree-structure and recursive function calls. The technical details of why the performance is faster this way is hard to explain both simply and accurately, so I'll spare the explanation. You can think about it like this: the parser generates a tree, to evalute the program, we need to do tree-traversal, tree-traversal is slow, therefore we should minimize the amount of tree-traversal. In the AST evaluator, tree-traversal is used on an expression everytime it is run, in a loop or function call for instance. Instead we'd rather want to do tree-traversal once for the entire program, and then generate these instructions, which does not require tree-traversal to run, ie. we do the costly tree-traversal once instead of multiple times.
The primary downside of this approach compared to AST evaluation is the effort required. AST evalution is both conceptually simple and relatively simple to implement, as it executes the code in just the form the parser outputs, which is also close to the source language. To run instructions instead, we need to translate the program from the AST into runnable sequential instructions. To evalute using instructions, instead of AST evalution, we need to do the following conceptual steps (implementationally seperate in our case):
- Parse source code into AST.
- ~~Evaluate AST.~~
- Resolve symbols. Like how we used the symbol table in the evaluator.
- Check semantic code structure. We can often parse source code, that cannot be run. In the evaluator we had checks different places, that would throw errors. This is the same concept. This will also include type checking in our case.
- Translate (or compile) resolved and checked AST into instructions.
- Execute instructions.
As can be seen, some of the needed steps are steps which are combined in the evaluator. Symbol resolution (resolving) is comparable to how we resolved symbol values (variables) in the evaluator. Semantic code checking is comparative to how we check the AST at different places in the evaluator, such as checking that the `+`-operator is only applied to either 2 ints or 2 strings.
Translation into instructions and seperate executions are new steps. These are also conceptually more advanced than AST evaluation in the sense, that AST evaluation operates on a high level representation of the code meaning it's close to the original source code, whereas instructions are further away, meaning more low level. This means we need to make some information needed for executing instructions explicit, which may be implicit in AST representation because the tree-structure in-an-of-itself contains some of the information.
We also need to design the instructions, meaning we need to choose which instructions should be available (instruction set) and some details of how they should be run. The design decisions in this step is essentially arbitrary, meaning there often is not a *correct* decision, whereas the evaluation is designed exactly to evaluate the AST which in some sense is designed to exactly represent the source code. These design decisions require trade-offs, eg. of perfomance, simplicity, ease of implementation, portability, versatility.
## 6.4 Virtual machine
The component executing the program is called a virtual machine, VM for short, when we're working with instructions. We'll make the distinction like this: an evaluator is synonymous with an AST evaluation, whereas a virtual machine runs instructions.
The design decisions of a virtual machine, ie. how it runs and how the program should be, is called the architecture.