kompact-io-landing/content/drafts/unpicking-aiken-air.md

229 lines
7.8 KiB
Markdown

Aims:
- Describe the pipeline, and components getting from aiken to uplc.
## Preface
Aiken is undergoing active development.
This post was started Aiken ~v1.14.
With Aiken v1.15, there were already reasonably significant changes to the compilation pipeline.
The word is that there aren't as big changes in the near future, but
this article will undoubtably begin to diverge from the current codebase even before publishing.
## Aiken build
Tracing `aiken build`, the pipeline is roughly something like:
```
. -> Project::read_source_files ->
Vec<Source> -> Project::parse_sources ->
ParsedModules -> Project::type_check ->
CheckedModules -> CodeGenerator::build ->
AirTree -> AirTree::to_vec ->
Vec<Air> -> CodeGenerator::uplc_code_gen ->
Program / Term<Name> -> serialize ->
.
```
We'll pick our way through these steps
At a high level we are trying to do something straightforward: reformulate aiken code as uplc.
Some aiken expressions are relatively easy to handle for example an aiken `Int` goes to an `Int` in uplc.
Some aiken expressions require more involved handling, for example an aiken `If... If Else... Else `
must have the branches "nested" in uplc.
### The Preamble
#### cli handling
The cli enters at `aiken/src/cmd/mod.rs` which parses the command.
With some establishing of context, the program enters `Project::build` (`crates/aiken-project/src/lib.rs`),
which in turn calls `Project::compile`.
#### File crawl
The program looks for aiken files in both `./lib` and `./validator` subdirs.
For each it walks over all contents (recursively) looking for `.ak` extensions.
It treats these two sets of files a little differently.
Only validator files can contain the special validator functions.
#### Parse and Type check
`Project::parse_sources` parses the module source code.
The heavy lifting is done by `aiken_lang::parser::module`, which is evaluated on each file.
It produces a `Module` containing a list of parsed definitions of the file: functions, types _etc_,
together with "metadata" like docstrings and the file path.
`Project::type_check` inspects the parsed modules and, as the name implies, checks the types.
It flags type level warnings and errors.
It constructs a hash map of `CheckedModule`s.
#### Code generator
The code generator `CodeGenerator` (`aiken-lang/src/gen_uplc.rs`) is given
the definitions found from the previous step,
together with the plutus builtins.
It has additional fields for things like debugging.
This is handed over to a `Blueprint` (`aiken-project/src/blueprint/mod.rs`).
A blueprint does little more than find the validators on which to run the code gen.
The heavy lifting is done by `CodeGenerator::generate`.
We are now ready to take the source code and create plutus.
### Up in the air
Things become a bit intimidating at this point in terms of sheer lines of code:
`gen_uplc.rs` and three modules in `gen_uplc/` totals > 8500 LoC.
Aiken has its own _intermediate representation_ called `air` (as in Aiken Intermediate Representation).
These are common in compiled languages.
`Air` is defined in `aiken-lang/src/gen_uplc/air.rs`.
Unsurprisingly, it looks little bit like a language between aiken and plutus.
In fact, Aiken has another intermediate representation: `AirTree`.
This is constructed between the `TypedExpr` and `Vec<Air>` ie between parsed aiken and air.
#### AirTree
Within `CodeGenerator::generate`, `CodeGenerator::build` is called on the function body.
This constructs and returns an `AirTree`.
More on what an airtree is and its construction below.
At the same time `self` is treated as `mut`, so we need to keep an eye on this too.
The method which is called and uses this mutability of self is `self.assignment`.
It does so by
```sample
self.assignment >> self.expect_type_assign >> self.code_gen_functions.insert
```
and thus is creating a hashmap of all the functions that appear in the definition.
(`self.handle_each_clause` is also called with `mut` which in turn calls `self.build` for which `mut` it is needed.
`self.clause_pattern` is called with `mut` but it isn't used.)
###### Codegen assignment
~200 LoC
###### Codegen expect type assign
~400 LoC
###### ... Back to build
Validators in aiken are boolean functions while in uplc they are unit-valued (aka void-valued) functions.
Thus the airtree is wrapped such that `false` results in an error (`wrap_validator_condition`).
(Ed: I don't know why there is a prevailing thought that boolean functions are preferable than functions
that simply error if anything is wrong.)
`check_validator_args` again extends the airtree from the previous step,
and again calls `self.assignment` mutating self.
Something interesting is happening here.
Script context is the final argument of a validator - for any script purpose.
`check_validator_args` treats the script context like it is an unused argument.
We'll circle back to how this works later on.
Next we encounter
```rust
AirTree::no_op().hoist_over(validator_args_tree);
```
Its not very apparent why we need to do this. Let's look ahead and consider this later.
The final airtree to step(s) are in `self.hoist_functions_to_validator`.
TODO: What happens here?!
Note that `AirTree` and its methods aren't fully typesafe.
For example `hoist_over` will throw an error if called on an `Expression`.
As `AirTree` is for internal use only, the scope for potential problems is reasonably contained.
The AirTree has the following definition
```rust
pub enum AirTree {
Statement {
statement: AirStatement,
hoisted_over: Option<Box<AirTree>>,
},
Expression(AirExpression),
UnhoistedSequence(Vec<AirTree>),
}
```
We can see it has a tree-like structure, as the name suggests.
`AirExpression` has multiple constructors. These include (non-exhaustive)
- air primitives (including all the ones that appear in plutus)
- constructors `Call` and `Fn` to handle functions
- binary and unary operators
- handling when and if
- error and tracing
`AirStatement` also has multiple constructors.
for handling functions, `plutus primitives, along with
An `AirStatement`
## Down to uplc
## Air
Aiken compiles aiken code to uplc via _air_:
Aiken Intermediate Representation.
## Trace
Running `aiken build`...
The cli (See `aiken/src/cmd/mod.rs`) parses the command,
finds the context and calls `Project::build` (`crates/aiken-project/src/lib.rs`),
which in turn calls `Project::compile`.
#### `Project::compile`
1. Check dependencies are available _eg_ aiken stdlib.
2. Read source files.
1. Walk over `./lib` and `./validators` and push aiken modules onto `Project.sources`.
3. Parse each source in sources:
1. Generate a `ParsedModule` containing the `ast`, `docs`, _etc_.
The `ast` here is an `UntypedModule`, which contains untyped definitions.
4. Type check each parsed module.
1. For each untyped module, create a `CheckedModule`.
This includes typed definitions.
5. `compile` forks into two depending on whether it's been called with `build` or `check`.
6. From `CheckModules` construct a `CodeGenerator`
7. Pass the generator to construct a new `Blueprints`.
1. Blueprints finds validators from checked modules.
2. From each it constructs a `Validator` with the constructor `Validator::from_checked_module` (which returns a vector of validators)
1. Its here that the magic happens: The method `generator.generate(def)` is called,
where `def` is the typed validator(s).
This method outputs a `Program<Name>` which contains the UPLC.
2. These are collected together.
3. The rest is collecting and handling the errors and warnings and writing the blueprint.
#### `CodeGenerator::generate`
1. Create a new `AirStack`.
#### `AirStack`
Consists of:
1. An Id
2. A `Scope`
3. A vector of `Air`
The Scope keeps track of ... [TODO]
#### Air
Air is a typed language... [TODO]