Add to chapter 6

This commit is contained in:
Simon From Jakobsen 2024-10-25 11:04:12 +00:00
parent 51e7bb9401
commit bb0b3ddcc3

View File

@ -29,8 +29,36 @@ One downside is that some features of the source languages are *ugly* to impleme
But the primary downside of AST evaluation is performance. While humans are most efficient when using layers of abstractions, computers are not. For various reasons, calling functions recursively repeatedly and jumping through tree-structures is very inefficient for computers. Both in terms of memory footprint and execution time. Computers are much more efficient with sequential execution.
Take the expression defined above. Now imagine we're describing the instructions for how to get the result. We would of course look at the while expression, then break it down. We could then formulate instructions such as *add `3` to `4`, then multiply by `2`*. Now if we *execute* the instructions, we don't start by examening the entire expression, we just execute the instructions in order. We have now translated the expression into linear execution, which computers are very good at running.
Take the expression defined above. Now imagine we're describing the instructions for how to get the result. We would of course look at the while expression, then break it down. We could then formulate *instructions* such as *add 3 to 4, then multiply by 2*. Now if we execute the instructions, we don't start by examening the entire expression, we just execute the instructions in order. We have now translated the expression into linear execution, which computers are very good at running.
## 6.3 Instructions
The 2 main pros of using instructions counter the cons of AST evaluation.
A lesser point pertaining to implementing control flow, is that everything is done using the sequential instructions. This means special control flow such as break, are handled in the same manner as regular control flow like loops.
The primary upside compared to AST evaluation is performance. Running instructions is a sequential operation, which could for example look like an array of instructions, where a location of the current instruction is stored, and then execution loops over the array with a for loop. This is A LOT faster compared to AST evaluation with a tree-structure and recursive function calls. The technical details of why the performance is faster this way is hard to explain both simply and accurately, so I'll spare the explanation. You can think about it like this: the parser generates a tree, to evalute the program, we need to do tree-traversal, tree-traversal is slow, therefore we should minimize the amount of tree-traversal. In the AST evaluator, tree-traversal is used on an expression everytime it is run, in a loop or function call for instance. Instead we'd rather want to do tree-traversal once for the entire program, and then generate these instructions, which does not require tree-traversal to run, ie. we do the costly tree-traversal once instead of multiple times.
The primary downside of this approach compared to AST evaluation is the effort required. AST evalution is both conceptually simple and relatively simple to implement, as it executes the code in just the form the parser outputs, which is also close to the source language. To run instructions instead, we need to translate the program from the AST into runnable sequential instructions. To evalute using instructions, instead of AST evalution, we need to do the following conceptual steps (implementationally seperate in our case):
- Parse source code into AST.
- ~~Evaluate AST.~~
- Resolve symbols. Like how we used the symbol table in the evaluator.
- Check semantic code structure. We can often parse source code, that cannot be run. In the evaluator we had checks different places, that would throw errors. This is the same concept. This will also include type checking in our case.
- Translate (or compile) resolved and checked AST into instructions.
- Execute instructions.
As can be seen, some of the needed steps are steps which are combined in the evaluator. Symbol resolution (resolving) is comparable to how we resolved symbol values (variables) in the evaluator. Semantic code checking is comparative to how we check the AST at different places in the evaluator, such as checking that the `+`-operator is only applied to either 2 ints or 2 strings.
Translation into instructions and seperate executions are new steps. These are also conceptually more advanced than AST evaluation in the sense, that AST evaluation operates on a high level representation of the code meaning it's close to the original source code, whereas instructions are further away, meaning more low level. This means we need to make some information needed for executing instructions explicit, which may be implicit in AST representation because the tree-structure in-an-of-itself contains some of the information.
We also need to design the instructions, meaning we need to choose which instructions should be available (instruction set) and some details of how they should be run. The design decisions in this step is essentially arbitrary, meaning there often is not a *correct* decision, whereas the evaluation is designed exactly to evaluate the AST which in some sense is designed to exactly represent the source code. These design decisions require trade-offs, eg. of perfomance, simplicity, ease of implementation, portability, versatility.
## 6.4 Virtual machine
The component executing the program is called a virtual machine, VM for short, when we're working with instructions. We'll make the distinction like this: an evaluator is synonymous with an AST evaluation, whereas a virtual machine runs instructions.
The design decisions of a virtual machine, ie. how it runs and how the program should be, is called the architecture.