chapter 3, operands, postfix expressions
This commit is contained in:
		
							parent
							
								
									153c71d78e
								
							
						
					
					
						commit
						71d4553523
					
				| @ -417,7 +417,7 @@ while (token !== null) { | |||||||
| 
 | 
 | ||||||
| ## 2.10 Exercises | ## 2.10 Exercises | ||||||
| 
 | 
 | ||||||
| 1. Implement the operators: `-`, `*`, `/`, `(`, `)`, `[`, `]`, `!=`, `<`, `>`, `<=` and `>=`. | 1. Implement the operators: `-`, `*`, `/`, `(`, `)`, `.`, `,`, `,`, `[`, `]`, `!=`, `<`, `>`, `<=` and `>=`. | ||||||
| 2. Implement the keywords: `true`, `false`, `null`, `or`, `and`, `not`, `loop`, `break`, `let`, `fn` and `return`. | 2. Implement the keywords: `true`, `false`, `null`, `or`, `and`, `not`, `loop`, `break`, `let`, `fn` and `return`. | ||||||
| 3. \* Implement single line comments using `//` and multiline comments using `\*` and `*\` (\*\* extra points if multiline comments can be nested, eg. `/* ... /* ... */ ... */`). | 3. \* Implement single line comments using `//` and multiline comments using `\*` and `*\` (\*\* extra points if multiline comments can be nested, eg. `/* ... /* ... */ ... */`). | ||||||
| 4. \* Reimplement integers such that integers are either `0` or start with `[1-9]`. | 4. \* Reimplement integers such that integers are either `0` or start with `[1-9]`. | ||||||
|  | |||||||
| @ -1,11 +1,11 @@ | |||||||
| 
 | 
 | ||||||
| # Parser | # 3 Parser | ||||||
| 
 | 
 | ||||||
| In this chaper I'll show how I would make a parser. | In this chaper I'll show how I would make a parser. | ||||||
| 
 | 
 | ||||||
| A parser, in addition to our lexer, transforms the input program as text, meaning an unstructured sequence of characters, into a structered representation. Structured meaning the representation tells us about the different constructs such as if statements and expressions. | A parser, in addition to our lexer, transforms the input program as text, meaning an unstructured sequence of characters, into a structered representation. Structured meaning the representation tells us about the different constructs such as if statements and expressions. | ||||||
| 
 | 
 | ||||||
| ## Abstract Syntax Tree AST | ## 3.1 Abstract Syntax Tree AST | ||||||
| 
 | 
 | ||||||
| The result of parsing is a tree structure representing the input program. | The result of parsing is a tree structure representing the input program. | ||||||
| 
 | 
 | ||||||
| @ -43,7 +43,7 @@ Both `Stmt` (statement) and `Expr` (expression) are polymorphic types, meaning a | |||||||
| 
 | 
 | ||||||
| For both `Stmt` and `Expr` there's an error-kind. This makes the parser simpler, as we won't need to manage parsing failures differently than successful parslings. | For both `Stmt` and `Expr` there's an error-kind. This makes the parser simpler, as we won't need to manage parsing failures differently than successful parslings. | ||||||
| 
 | 
 | ||||||
| ## Consumer of lexer | ## 3.2 Consumer of lexer | ||||||
| 
 | 
 | ||||||
| To start, we'll implement a `Parser` class, which for now is simply a consumer of a token iterater, meaning the lexer. In simple terms, whereas the lexer is a transformation from text to tokens, the parser is a transformation from token to an AST, except that the parser is not an iterator. | To start, we'll implement a `Parser` class, which for now is simply a consumer of a token iterater, meaning the lexer. In simple terms, whereas the lexer is a transformation from text to tokens, the parser is a transformation from token to an AST, except that the parser is not an iterator. | ||||||
| 
 | 
 | ||||||
| @ -110,14 +110,14 @@ We'll also want a method for reporting errors. | |||||||
| ```ts | ```ts | ||||||
| class Parser { | class Parser { | ||||||
|     // ... |     // ... | ||||||
|     private report(pos: Pos, msg: string) { |     private report(msg: string, pos = this.pos()) { | ||||||
|         console.log(`Parser: ${msg} at ${pos.line}:${pos.col}`); |         console.log(`Parser: ${msg} at ${pos.line}:${pos.col}`); | ||||||
|     } |     } | ||||||
|     // ... |     // ... | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| ## Operands | ## 3.3 Operands | ||||||
| 
 | 
 | ||||||
| Operands are the individual parts of an operation. For example, in the math expression `a + b`, (would be `+ a b` in the input language), `a` and `b` are the *operands*, while `+` is the *operator*. In the expression `a + b * c`, the operands are `a`, `b` and `c`. But in the expression `a * (b + c)`, the operands of the multiply operation are `a` and `(b + c)`. `(b + c)` is an operands, because it is enclosed on both sides. This is how we'll define operands. | Operands are the individual parts of an operation. For example, in the math expression `a + b`, (would be `+ a b` in the input language), `a` and `b` are the *operands*, while `+` is the *operator*. In the expression `a + b * c`, the operands are `a`, `b` and `c`. But in the expression `a * (b + c)`, the operands of the multiply operation are `a` and `(b + c)`. `(b + c)` is an operands, because it is enclosed on both sides. This is how we'll define operands. | ||||||
| 
 | 
 | ||||||
| @ -128,12 +128,8 @@ class Parser { | |||||||
|     // ... |     // ... | ||||||
|     public parseOperand(): Expr { |     public parseOperand(): Expr { | ||||||
|         const pos = this.pos(); |         const pos = this.pos(); | ||||||
|         if (this.test("int")) { |         // ... | ||||||
|             const value = this.current().intValue; |         this.report("expected expr", pos); | ||||||
|             this.step(); |  | ||||||
|             return { kind: { type: "int", value }, pos }; |  | ||||||
|         } |  | ||||||
|         this.report(pos "expected expr"); |  | ||||||
|         this.step(); |         this.step(); | ||||||
|         return { kind: { type: "error" }, pos }; |         return { kind: { type: "error" }, pos }; | ||||||
|     } |     } | ||||||
| @ -141,14 +137,16 @@ class Parser { | |||||||
| } | } | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| ### Integer | ### 3.3.1 Identifiers and literals | ||||||
| 
 | 
 | ||||||
| Parsing an integer is a 1:1 translation between the integer token and an integer expression. | Identifiers and literals (integers, strings) are single token constructs, meaning the parsing consists of translating a token into an ast-node with the value. | ||||||
| 
 | 
 | ||||||
| ```ts | ```ts | ||||||
| type ExprKind = | type ExprKind = | ||||||
|     // ... |     // ... | ||||||
|  |     | { type: "ident", value: string } | ||||||
|     | { type: "int", value: number } |     | { type: "int", value: number } | ||||||
|  |     | { type: "string", value: string } | ||||||
|     // ... |     // ... | ||||||
|     ; |     ; | ||||||
| ``` | ``` | ||||||
| @ -158,14 +156,266 @@ class Parser { | |||||||
|     // ... |     // ... | ||||||
|     public parseOperand(): Expr { |     public parseOperand(): Expr { | ||||||
|         // ... |         // ... | ||||||
|  |         if (this.test("ident")) { | ||||||
|  |             const value = this.current().identValue; | ||||||
|  |             this.step(); | ||||||
|  |             return { kind: { type: "ident", value }, pos }; | ||||||
|  |         } | ||||||
|         if (this.test("int")) { |         if (this.test("int")) { | ||||||
|             const value = this.current().intValue; |             const value = this.current().intValue; | ||||||
|             this.step(); |             this.step(); | ||||||
|             return { kind: { type: "int", value }, pos }; |             return { kind: { type: "int", value }, pos }; | ||||||
|         } |         } | ||||||
|  |         if (this.test("string")) { | ||||||
|  |             const value = this.current().stringValue; | ||||||
|  |             this.step(); | ||||||
|  |             return { kind: { type: "string", value }, pos }; | ||||||
|  |         } | ||||||
|         // ... |         // ... | ||||||
|     } |     } | ||||||
|     // ... |     // ... | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
|  | ### 3.3.2 Group expressions | ||||||
|  | 
 | ||||||
|  | A group expression is an expression enclosed in parenthesis, eg `(1 + 2)`. Because the expression is enclosed, meaning starts with a `(`-token and ends with a `)`-token, we will treat is like an operand. | ||||||
|  | 
 | ||||||
|  | ```ts | ||||||
|  | type ExprKind = | ||||||
|  |     // ... | ||||||
|  |     | { type: "group", expr: Expr } | ||||||
|  |     // ... | ||||||
|  |     ; | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | If we find a `(`-token in `.parseOperand()`, we know that we should parse a group expression. We do this by ignoring the `(`-token, parsing an expression using `.parseExpr()` and checking that we find a `)`-token afterwards. | ||||||
|  | 
 | ||||||
|  | ```ts | ||||||
|  | class Parser { | ||||||
|  |     // ... | ||||||
|  |     public parseOperand(): Expr { | ||||||
|  |         // ... | ||||||
|  |         if (this.test("(")) { | ||||||
|  |             this.step(); | ||||||
|  |             const expr = this.parseExpr(); | ||||||
|  |             if (!this.test(")")) { | ||||||
|  |                 this.report("expected ')'"); | ||||||
|  |                 return { kind: { type: "error" }, pos }; | ||||||
|  |             } | ||||||
|  |             this.step(); | ||||||
|  |             return { kind: { type: "group", expr }, pos }; | ||||||
|  |         } | ||||||
|  |         // ... | ||||||
|  |     } | ||||||
|  |     // ... | ||||||
|  | } | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | If we do not find the closing `)`-token, we report an error and return an error expression. | ||||||
|  | 
 | ||||||
|  | ### 3.3.3 Block, if and loop operands | ||||||
|  | 
 | ||||||
|  | We want to be able to use blocks, if and loop constructs as expressions. | ||||||
|  | 
 | ||||||
|  | Example: | ||||||
|  | ```rs | ||||||
|  | let temperature_feeling = if > temperature 20 { "hot" } else { "cold" }; | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | Each construct will have their own `.parse...()`-method, so we'll just look for the first `{`-, `if`-, or `loop`-token and call the relevant method. | ||||||
|  | 
 | ||||||
|  | ```ts | ||||||
|  | class Parser { | ||||||
|  |     // ... | ||||||
|  |     public parseOperand(): Expr { | ||||||
|  |         // ... | ||||||
|  |         if (this.test("{")) | ||||||
|  |             return this.parseBlock(); | ||||||
|  |         if (this.test("if")) | ||||||
|  |             return this.parseIf(); | ||||||
|  |         if (this.test("loop")) | ||||||
|  |             return this.parseLoop(); | ||||||
|  |         // ... | ||||||
|  |     } | ||||||
|  |     // ... | ||||||
|  | } | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | ## 3.4 Postfix operators | ||||||
|  | 
 | ||||||
|  | Postfix operations are expressions were the operators come after the subject expression. This includes field expressions (eg. `subject.field`), index expressions (eg. `subject[index]`) and call expressions (eg. `subject(...args)`). | ||||||
|  | 
 | ||||||
|  | A notable detail, is that postfix operations are chainable, eg. `subject[index].field` is valid, likewise with `subject.method(arg)` and `matrix[y][x]`. | ||||||
|  | 
 | ||||||
|  | We'll make a method `.parsePostfix()` to parse postfix operators. | ||||||
|  | 
 | ||||||
|  | ```ts | ||||||
|  | class Parser { | ||||||
|  |     // ... | ||||||
|  |     public parsePostfix(): Expr { | ||||||
|  |         let subject = this.parseOperand(); | ||||||
|  |         while (true) { | ||||||
|  |             const pos = this.pos(); | ||||||
|  |             // ... | ||||||
|  |             break; | ||||||
|  |         } | ||||||
|  |         return subject; | ||||||
|  |     } | ||||||
|  |     // ... | ||||||
|  | } | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | We start by parsing an operand. Then we enter a loop, which runs until we no longer find any relevant operator tokens. When we parse a postfix expression, the `subject` will be replaced with the new parsed expression. | ||||||
|  | 
 | ||||||
|  | Notice we don't define `pos` at the start, but after we've parsed the subject. That's because we want `pos` to the reflect the start of the postfix operator, not the start of the subject. | ||||||
|  | 
 | ||||||
|  | ### 3.4.1 Field expression | ||||||
|  | 
 | ||||||
|  | A field expression is for accessing fields on an object, and consists of a `.`-token and an identifier, eg. `.field`. | ||||||
|  | 
 | ||||||
|  | ```ts | ||||||
|  | type ExprKind = | ||||||
|  |     // ... | ||||||
|  |     | { type: "field", subject: Expr, value: string } | ||||||
|  |     // ... | ||||||
|  |     ; | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | ```ts | ||||||
|  | class Parser { | ||||||
|  |     // ... | ||||||
|  |     public parsePostfix(): Expr { | ||||||
|  |         // ... | ||||||
|  |         while (true) { | ||||||
|  |             // ... | ||||||
|  |             if (this.test(".")) { | ||||||
|  |                 this.step(); | ||||||
|  |                 if (!this.test("ident")) { | ||||||
|  |                     this.report("expected ident"); | ||||||
|  |                     return { kind: { type: "error" }, pos }; | ||||||
|  |                 } | ||||||
|  |                 const value = this.current().identValue; | ||||||
|  |                 this.step(); | ||||||
|  |                 subject = { kind: { type: "field", subject, value }, pos }; | ||||||
|  |                 continue; | ||||||
|  |             } | ||||||
|  |             // ... | ||||||
|  |         } | ||||||
|  |         // ... | ||||||
|  |     } | ||||||
|  |     // ... | ||||||
|  | } | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | If we find a `.`-token, we step over it, and make sure that we've hit an identifier. We save the identifier value and step over the identifier. Then we replace `subject` with a new field expression containing the previous `subject` value. Then we continue to look for the next postfix operator. | ||||||
|  | 
 | ||||||
|  | ### 3.4.2 Index expression | ||||||
|  | 
 | ||||||
|  | An index operation consists of the subject and an index. The index is an expression, and it is contained in `[`- and `]`-tokens, eg. `subject[value]`. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | ```ts | ||||||
|  | type ExprKind = | ||||||
|  |     // ... | ||||||
|  |     | { type: "index", subject: Expr, value: Expr } | ||||||
|  |     // ... | ||||||
|  |     ; | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | ```ts | ||||||
|  | class Parser { | ||||||
|  |     // ... | ||||||
|  |     public parsePostfix(): Expr { | ||||||
|  |         // ... | ||||||
|  |         while (true) { | ||||||
|  |             // ... | ||||||
|  |             if (this.test("[")) { | ||||||
|  |                 this.step(); | ||||||
|  |                 const value = this.parseExpr(); | ||||||
|  |                 if (!this.test("]") { | ||||||
|  |                     this.report("expected ']'"); | ||||||
|  |                     return { kind: { type: "error" }, pos }; | ||||||
|  |                 } | ||||||
|  |                 this.step(); | ||||||
|  |                 subject = { kind: { type: "index", subject, value }, pos }; | ||||||
|  |                 continue; | ||||||
|  |             } | ||||||
|  |             // ... | ||||||
|  |         } | ||||||
|  |         // ... | ||||||
|  |     } | ||||||
|  |     // ... | ||||||
|  | } | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | If we find a `[`-token, we parse the index part exactly the same way, we parse a group expression. | ||||||
|  | 
 | ||||||
|  | ### 3.4.3 Call expression | ||||||
|  | 
 | ||||||
|  | A call expression is like an index expression, except that it uses `(` and `)` instead of `[` and `]` and that there can be 0 or more expressions (arguments or args) inside the `(` and `)`. The arguments are seperated by `,`. | ||||||
|  | 
 | ||||||
|  | ```ts | ||||||
|  | type ExprKind = | ||||||
|  |     // ... | ||||||
|  |     | { type: "call", subject: Expr, args: Expr[] } | ||||||
|  |     // ... | ||||||
|  |     ; | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | ```ts | ||||||
|  | class Parser { | ||||||
|  |     // ... | ||||||
|  |     public parsePostfix(): Expr { | ||||||
|  |         // ... | ||||||
|  |         while (true) { | ||||||
|  |             // ... | ||||||
|  |             if (this.test("(")) { | ||||||
|  |                 this.step(); | ||||||
|  |                 let args: Expr[] = []; | ||||||
|  |                 if (!this.test(")") { | ||||||
|  |                     args.push(this.parseExpr()); | ||||||
|  |                     while (this.test(",")) { | ||||||
|  |                         this.step(); | ||||||
|  |                         if (this.test(")")) | ||||||
|  |                             break; | ||||||
|  |                         args.push(this.parseExpr()); | ||||||
|  |                     } | ||||||
|  |                 } | ||||||
|  |                 const value = this.parseExpr(); | ||||||
|  |                 if (!this.test(")") { | ||||||
|  |                     this.report("expected ')'"); | ||||||
|  |                     return { kind: { type: "error" }, pos }; | ||||||
|  |                 } | ||||||
|  |                 this.step(); | ||||||
|  |                 subject = { kind: { type: "call", subject, args }, pos }; | ||||||
|  |                 continue; | ||||||
|  |             } | ||||||
|  |             // ... | ||||||
|  |         } | ||||||
|  |         // ... | ||||||
|  |     } | ||||||
|  |     // ... | ||||||
|  | } | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | Similarly to index epxressions, if we find a `(`-token, we step over it, parse the arguments, check for a `)` and replace `subject` with a call expression containing the previous `subject`. | ||||||
|  | 
 | ||||||
|  | When parsing the arguments, we start by testing if we've reached a `)` to check if there are any arguments. If not, we parse the first argument. | ||||||
|  | 
 | ||||||
|  | The consecutive arguments are all preceded by a `,`-token. There we test or `,`, to check if we should keep parsing arguments. | ||||||
|  | 
 | ||||||
|  | After checking for a seperating `,`, we check if we've reached a `)` and break if so. This is to allow for trailing comma, eg. | ||||||
|  | ```ts | ||||||
|  | func( | ||||||
|  |     a, | ||||||
|  |     b, // trailing comma | ||||||
|  | ) | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | ### 3.5 Prefix expressions | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | |||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user
	 Simon From Jakobsen
						Simon From Jakobsen