edits to tracing aiken
This commit is contained in:
parent
f3b88b8446
commit
38e68b5316
|
@ -0,0 +1,275 @@
|
|||
Aims:
|
||||
|
||||
> Describe the pipeline and components getting from aiken to uplc.
|
||||
|
||||
## The Preface
|
||||
|
||||
### Motivations
|
||||
|
||||
The motivation for writing this came from a desire to add additional features to aiken not yet available.
|
||||
One such feature would evaluate an arbitrary function in aiken callable from javascript.
|
||||
This would help a lot with testing trying to align on and off-chain code.
|
||||
|
||||
Another more pipe dreamy, adhoc function extraction - from a span of code, generate a function.
|
||||
A digression to answer _why would this be at all helpful?!_
|
||||
Validator logic often needs a broad context throughout.
|
||||
How then to best factor code?
|
||||
Possible solutions:
|
||||
|
||||
1. Introduce types / structs
|
||||
2. Have functions with lots of arguments
|
||||
3. Don't
|
||||
|
||||
The problems are:
|
||||
|
||||
1. Requires relentless constructing and deconstructing across the function call.
|
||||
And this is adds costs in aiken.
|
||||
2. Becomes tedious aligning the definition and function call.
|
||||
3. End up with very long validators which are hard to unit test.
|
||||
|
||||
My current preferred way is to accept that validator functions are long.
|
||||
Adhoc function extraction would allow for sections of code to be tested without needing to be factored out.
|
||||
|
||||
To do either of these, we need to get to grips with the aiken compilation pipeline.
|
||||
|
||||
### This won't age well
|
||||
|
||||
Aiken is undergoing active development.
|
||||
This post was started life with Aiken ~v1.14.
|
||||
With Aiken v1.15, there were already reasonably significant changes to the compilation pipeline.
|
||||
The word is that there aren't as big changes in the near future,
|
||||
but this article will undoubtably begin to diverge from the current codebase even before publishing.
|
||||
|
||||
### Limitations of narating code
|
||||
|
||||
Narating code becomes a compromise between being honest and accurate, and being readable and digestable.
|
||||
Following the command `aiken build` covers well in excess of 10,000 LoC.
|
||||
The writing of this post ground slowly to a halt as it progressed deeper into the code
|
||||
with the details seeming to increase in importance.
|
||||
At some point I had to draw a line and resign to fact that some parts will remain black boxes for now.
|
||||
|
||||
## Aiken build
|
||||
|
||||
Tracing `aiken build`, the pipeline is roughly:
|
||||
```
|
||||
. -> Project::read_source_files ->
|
||||
Vec<Source> -> Project::parse_sources ->
|
||||
ParsedModules -> Project::type_check ->
|
||||
CheckedModules -> CodeGenerator::build ->
|
||||
AirTree -> AirTree::to_vec ->
|
||||
Vec<Air> -> CodeGenerator::uplc_code_gen ->
|
||||
Program / Term<Name> -> serialize ->
|
||||
.
|
||||
```
|
||||
We'll pick our way through these steps
|
||||
|
||||
At a high level we are trying to do something straightforward: reformulate aiken code as uplc.
|
||||
Some aiken expressions are relatively easy to handle for example an aiken `Int` goes to an `Int` in uplc.
|
||||
Some aiken expressions require more involved handling, for example an aiken `If... If Else... Else `
|
||||
must have the branches "nested" in uplc.
|
||||
Aiken also have lots of nice-to-haves like pattern matching, modules, and generics.
|
||||
Uplc has none of these.
|
||||
|
||||
### The Preamble
|
||||
|
||||
#### Cli handling
|
||||
|
||||
The cli enters at `aiken/src/cmd/mod.rs` which parses the command.
|
||||
With some establishing of context, the program enters `Project::build` (`crates/aiken-project/src/lib.rs`),
|
||||
which in turn calls `Project::compile`.
|
||||
|
||||
#### File crawl
|
||||
|
||||
The program looks for aiken files in both `./lib` and `./validator` subdirs.
|
||||
For each it walks over all contents (recursively) looking for `.ak` extensions.
|
||||
It treats these two sets of files a little differently.
|
||||
For example, only validator files can contain the special validator functions.
|
||||
|
||||
#### Parse and Type check
|
||||
|
||||
`Project::parse_sources` parses the module source code.
|
||||
The heavy lifting is done by `aiken_lang::parser::module`, which is evaluated on each file.
|
||||
It produces a `Module` containing a list of parsed definitions of the file: functions, types _etc_,
|
||||
together with metadata like docstrings and the file path.
|
||||
|
||||
`Project::type_check` inspects the parsed modules and, as the name implies, checks the types.
|
||||
It flags type level warnings and errors and constructs a hash map of `CheckedModule`s.
|
||||
|
||||
#### Code generator
|
||||
|
||||
The code generator `CodeGenerator` (`aiken-lang/src/gen_uplc.rs`) is given
|
||||
the definitions found from the previous step,
|
||||
together with the plutus builtins.
|
||||
It has additional fields for things like debugging.
|
||||
|
||||
This is handed over to a `Blueprint` (`aiken-project/src/blueprint/mod.rs`).
|
||||
The blueprint does little more than find the validators on which to run the code gen.
|
||||
The heavy lifting is done by `CodeGenerator::generate`.
|
||||
|
||||
We are now ready to take the source code and create plutus.
|
||||
|
||||
### In the air
|
||||
|
||||
Things become a bit intimidating at this point in terms of sheer lines of code:
|
||||
`gen_uplc.rs` and three modules in `gen_uplc/` totals > 8500 LoC.
|
||||
|
||||
Aiken has its own _intermediate representation_ called `air` (as in Aiken Intermediate Representation).
|
||||
These are common in compiled languages.
|
||||
`Air` is defined in `aiken-lang/src/gen_uplc/air.rs`.
|
||||
Unsurprisingly, it looks little bit like a language between aiken and plutus.
|
||||
|
||||
In fact, Aiken has another intermediate representation: `AirTree`.
|
||||
This is constructed between the `TypedExpr` and `Vec<Air>` ie between parsed aiken and air.
|
||||
|
||||
#### Climbing the AirTree
|
||||
|
||||
Within `CodeGenerator::generate`, `CodeGenerator::build` is called on the function body.
|
||||
This takes a `TypedExpr` and constructs and returns an `AirTree`.
|
||||
The construction is recursive as it traverses the recursive `TypedExpr` data structure.
|
||||
More on what an airtree is and its construction below.
|
||||
At the same time `self` is treated as `mut`, so we need to keep an eye on this too.
|
||||
The method which is called and uses this mutability of self is `self.assignment`.
|
||||
It does so by
|
||||
```sample
|
||||
self.assignment >> self.expect_type_assign >> self.code_gen_functions.insert
|
||||
```
|
||||
and thus is creating a hashmap of all the functions that appear in the definition.
|
||||
From the call to return of `assign` covers > 600 LoC so we'll leave this as otherwise a black box.
|
||||
(`self.handle_each_clause` is also called with `mut` which in turn calls `self.build` for which `mut` it is needed.)
|
||||
|
||||
Validators in aiken are boolean functions while in uplc they are unit-valued (aka void-valued) functions.
|
||||
Thus the airtree is wrapped such that `false` results in an error (`wrap_validator_condition`).
|
||||
I don't know why there is a prevailing thought that boolean functions are preferable than functions
|
||||
that error if anything is wrong - which is what validators are.
|
||||
|
||||
`check_validator_args` again extends the airtree from the previous step,
|
||||
and again calls `self.assignment` mutating self.
|
||||
Something interesting is happening here.
|
||||
Script context is the final argument of a validator - for any script purpose.
|
||||
`check_validator_args` treats the script context like it is an unused argument.
|
||||
The importance of this is not immediate, and I've still yet to appreciate why this happens.
|
||||
|
||||
Let's take a look at what AirTree actually is
|
||||
```rust
|
||||
pub enum AirTree {
|
||||
Statement {
|
||||
statement: AirStatement,
|
||||
hoisted_over: Option<Box<AirTree>>,
|
||||
},
|
||||
Expression(AirExpression),
|
||||
UnhoistedSequence(Vec<AirTree>),
|
||||
}
|
||||
```
|
||||
Note that `AirStatement` and `AirExpression` are mutually recusive definitions with `AirTree`.
|
||||
Otherwise, it would be unclear from first inspection how tree-like this really is.
|
||||
|
||||
`AirExpression` has multiple constructors. These include (non-exhaustive)
|
||||
|
||||
- air primitives (including all the ones that appear in plutus)
|
||||
- constructors `Call` and `Fn` to handle anonymous functions
|
||||
- binary and unary operators
|
||||
- handling when and if
|
||||
- handling error and tracing
|
||||
|
||||
`AirStatement` also has multiple constructors. These include
|
||||
|
||||
- let assignments and named function definitions
|
||||
- handling expect assignments
|
||||
- pattern matching
|
||||
- unwrapping datastructures
|
||||
|
||||
Note that `AirTree` has many methods that are partial functions,
|
||||
as in there are possible states that are not considered legitimate
|
||||
at different points of its construction and use.
|
||||
For example `hoist_over` will throw an error if called on an `Expression`.
|
||||
As `AirTree` is for internal use only, the scope for potential problems is reasonably contained.
|
||||
It seems likely this is to avoid similar-yet-different IRs between steps.
|
||||
However, the trade off is that it partially obsufucates what is a valid state where.
|
||||
|
||||
What is hoisting? hoisting gives the airtree depth.
|
||||
The motivation is that by the time we hit uplc it is "generally better"
|
||||
that
|
||||
|
||||
- function defintions appear once rather than being inlined multiple times
|
||||
- the definition appears as close to use as possible
|
||||
|
||||
Hoisting creates tree paths.
|
||||
The final airtree to airtree step is`self.hoist_functions_to_validator` traverses the paths.
|
||||
There is a lot of mutating of self, making it quite hard to keep a handle on things.
|
||||
In all this (several thousand?) LoC, it is essentially ascertaining in which node of the tree
|
||||
to insert each function definiton.
|
||||
In a resource constrained environment like plutus, this effort is warranted.
|
||||
|
||||
At the same time this function deals with
|
||||
|
||||
- monomophisation - no more generics
|
||||
- erasing opaque types
|
||||
|
||||
Neither of which exist at the uplc level.
|
||||
|
||||
#### Into Air
|
||||
|
||||
The `to_vec : AirTree -> Vec<Air>` is much easier to digest.
|
||||
For one, it is not evaluated in the context of the CodeGenerator,
|
||||
and two, there is no mutation of the airtree.
|
||||
The function recursively takes nodes of the tree and maps them to entries in a mutable vector.
|
||||
It flattens the tree to a vec.
|
||||
|
||||
### Down to uplc
|
||||
|
||||
Next we go from `Vec<Air> -> Term<Name>`.
|
||||
This step is a little more involved than the previous.
|
||||
For one, this is executed in the context of the code generator.
|
||||
Moreover, the code generatore is treated mutable - ouch.
|
||||
|
||||
On further inspection we see that the only mutation is setting `self.needs_field_access = true`.
|
||||
This flag informs the compiler that, if true, additional terms must be added in one of the final steps
|
||||
(see `CodeGenerator::finalize`).
|
||||
|
||||
As noted above, some of the mappings from air to terms are immediate like `Air::Bool -> Term::bool`.
|
||||
Others are less so.
|
||||
Some examples:
|
||||
|
||||
- `Air::Var` require 100 LoC to do case handling on different constructors.
|
||||
- Lists in air have no immediate analogue in uplc
|
||||
- builtins, as in built-in functions (standard shorthand), have to mediated
|
||||
with some combination of `force` and `delay` in order to behave as they should.
|
||||
- user functions must be "uncurried", ie treated as a sequence of single argument functions,
|
||||
and recursion must be handled
|
||||
- Do some magic in order to efficiently allow "record updates".
|
||||
|
||||
#### Cranking the Optimizer
|
||||
|
||||
There is a sequence of operations perfromed on the uplc mapping `Term<Name> -> Term<Name>`.
|
||||
These remove inconsequential parts of the logic which will appear.
|
||||
These include:
|
||||
|
||||
- removing application of the identity function
|
||||
- directly substituting where apply lambda is applied to a constant or builtin
|
||||
- inline or simplify where apply lambda is applied to a param that appears once or not at all
|
||||
|
||||
Each of these optimizing methods has a its own relatively narrow focus,
|
||||
and so although there is a fair number of LoC, it's reasonably straightforward to follow.
|
||||
Some are applied multiple times.
|
||||
|
||||
### The End
|
||||
|
||||
The generated program can now be serialized and included in the blueprint.
|
||||
|
||||
### Plutus Core Signposting
|
||||
|
||||
All this fuss is to get us to a point where we can write uplc - and good uplc at that.
|
||||
Note that there's many ways to generate code and most of them are bad.
|
||||
The various design decisions and compilation steps make more sense
|
||||
when we have a better understanding of the target language.
|
||||
|
||||
Uplc is a lambda calculus.
|
||||
For a comprehensive definition on uplc checkout the specification found
|
||||
[here](https://github.com/input-output-hk/plutus/#specifications-and-design) from the plutus github repo.
|
||||
(I imagine this link will be maintained longer than the current actual link.)
|
||||
If you're not at all familiar with lambda calculus I recommend
|
||||
[an unpacking](https://crypto.stanford.edu/~blynn/lambda/) by Ben Lynn.
|
||||
|
||||
### What next?
|
||||
|
||||
I think it would be helpful to have some examples... Watch this space.
|
|
@ -1,228 +0,0 @@
|
|||
Aims:
|
||||
|
||||
- Describe the pipeline, and components getting from aiken to uplc.
|
||||
|
||||
## Preface
|
||||
|
||||
Aiken is undergoing active development.
|
||||
This post was started Aiken ~v1.14.
|
||||
With Aiken v1.15, there were already reasonably significant changes to the compilation pipeline.
|
||||
The word is that there aren't as big changes in the near future, but
|
||||
this article will undoubtably begin to diverge from the current codebase even before publishing.
|
||||
|
||||
## Aiken build
|
||||
|
||||
Tracing `aiken build`, the pipeline is roughly something like:
|
||||
```
|
||||
. -> Project::read_source_files ->
|
||||
Vec<Source> -> Project::parse_sources ->
|
||||
ParsedModules -> Project::type_check ->
|
||||
CheckedModules -> CodeGenerator::build ->
|
||||
AirTree -> AirTree::to_vec ->
|
||||
Vec<Air> -> CodeGenerator::uplc_code_gen ->
|
||||
Program / Term<Name> -> serialize ->
|
||||
.
|
||||
```
|
||||
We'll pick our way through these steps
|
||||
|
||||
At a high level we are trying to do something straightforward: reformulate aiken code as uplc.
|
||||
Some aiken expressions are relatively easy to handle for example an aiken `Int` goes to an `Int` in uplc.
|
||||
Some aiken expressions require more involved handling, for example an aiken `If... If Else... Else `
|
||||
must have the branches "nested" in uplc.
|
||||
|
||||
### The Preamble
|
||||
|
||||
#### cli handling
|
||||
|
||||
The cli enters at `aiken/src/cmd/mod.rs` which parses the command.
|
||||
With some establishing of context, the program enters `Project::build` (`crates/aiken-project/src/lib.rs`),
|
||||
which in turn calls `Project::compile`.
|
||||
|
||||
#### File crawl
|
||||
|
||||
The program looks for aiken files in both `./lib` and `./validator` subdirs.
|
||||
For each it walks over all contents (recursively) looking for `.ak` extensions.
|
||||
It treats these two sets of files a little differently.
|
||||
Only validator files can contain the special validator functions.
|
||||
|
||||
#### Parse and Type check
|
||||
|
||||
`Project::parse_sources` parses the module source code.
|
||||
The heavy lifting is done by `aiken_lang::parser::module`, which is evaluated on each file.
|
||||
It produces a `Module` containing a list of parsed definitions of the file: functions, types _etc_,
|
||||
together with "metadata" like docstrings and the file path.
|
||||
|
||||
`Project::type_check` inspects the parsed modules and, as the name implies, checks the types.
|
||||
It flags type level warnings and errors.
|
||||
It constructs a hash map of `CheckedModule`s.
|
||||
|
||||
#### Code generator
|
||||
|
||||
The code generator `CodeGenerator` (`aiken-lang/src/gen_uplc.rs`) is given
|
||||
the definitions found from the previous step,
|
||||
together with the plutus builtins.
|
||||
It has additional fields for things like debugging.
|
||||
|
||||
This is handed over to a `Blueprint` (`aiken-project/src/blueprint/mod.rs`).
|
||||
A blueprint does little more than find the validators on which to run the code gen.
|
||||
The heavy lifting is done by `CodeGenerator::generate`.
|
||||
|
||||
We are now ready to take the source code and create plutus.
|
||||
|
||||
### Up in the air
|
||||
|
||||
Things become a bit intimidating at this point in terms of sheer lines of code:
|
||||
`gen_uplc.rs` and three modules in `gen_uplc/` totals > 8500 LoC.
|
||||
|
||||
Aiken has its own _intermediate representation_ called `air` (as in Aiken Intermediate Representation).
|
||||
These are common in compiled languages.
|
||||
`Air` is defined in `aiken-lang/src/gen_uplc/air.rs`.
|
||||
Unsurprisingly, it looks little bit like a language between aiken and plutus.
|
||||
|
||||
In fact, Aiken has another intermediate representation: `AirTree`.
|
||||
This is constructed between the `TypedExpr` and `Vec<Air>` ie between parsed aiken and air.
|
||||
|
||||
#### AirTree
|
||||
|
||||
Within `CodeGenerator::generate`, `CodeGenerator::build` is called on the function body.
|
||||
This constructs and returns an `AirTree`.
|
||||
More on what an airtree is and its construction below.
|
||||
At the same time `self` is treated as `mut`, so we need to keep an eye on this too.
|
||||
The method which is called and uses this mutability of self is `self.assignment`.
|
||||
It does so by
|
||||
```sample
|
||||
self.assignment >> self.expect_type_assign >> self.code_gen_functions.insert
|
||||
```
|
||||
and thus is creating a hashmap of all the functions that appear in the definition.
|
||||
(`self.handle_each_clause` is also called with `mut` which in turn calls `self.build` for which `mut` it is needed.
|
||||
`self.clause_pattern` is called with `mut` but it isn't used.)
|
||||
|
||||
###### Codegen assignment
|
||||
|
||||
~200 LoC
|
||||
|
||||
###### Codegen expect type assign
|
||||
|
||||
~400 LoC
|
||||
|
||||
###### ... Back to build
|
||||
|
||||
Validators in aiken are boolean functions while in uplc they are unit-valued (aka void-valued) functions.
|
||||
Thus the airtree is wrapped such that `false` results in an error (`wrap_validator_condition`).
|
||||
(Ed: I don't know why there is a prevailing thought that boolean functions are preferable than functions
|
||||
that simply error if anything is wrong.)
|
||||
|
||||
`check_validator_args` again extends the airtree from the previous step,
|
||||
and again calls `self.assignment` mutating self.
|
||||
Something interesting is happening here.
|
||||
Script context is the final argument of a validator - for any script purpose.
|
||||
`check_validator_args` treats the script context like it is an unused argument.
|
||||
We'll circle back to how this works later on.
|
||||
|
||||
Next we encounter
|
||||
```rust
|
||||
AirTree::no_op().hoist_over(validator_args_tree);
|
||||
```
|
||||
Its not very apparent why we need to do this. Let's look ahead and consider this later.
|
||||
|
||||
The final airtree to step(s) are in `self.hoist_functions_to_validator`.
|
||||
TODO: What happens here?!
|
||||
|
||||
|
||||
|
||||
|
||||
Note that `AirTree` and its methods aren't fully typesafe.
|
||||
For example `hoist_over` will throw an error if called on an `Expression`.
|
||||
As `AirTree` is for internal use only, the scope for potential problems is reasonably contained.
|
||||
|
||||
|
||||
|
||||
|
||||
The AirTree has the following definition
|
||||
```rust
|
||||
pub enum AirTree {
|
||||
Statement {
|
||||
statement: AirStatement,
|
||||
hoisted_over: Option<Box<AirTree>>,
|
||||
},
|
||||
Expression(AirExpression),
|
||||
UnhoistedSequence(Vec<AirTree>),
|
||||
}
|
||||
```
|
||||
We can see it has a tree-like structure, as the name suggests.
|
||||
|
||||
`AirExpression` has multiple constructors. These include (non-exhaustive)
|
||||
- air primitives (including all the ones that appear in plutus)
|
||||
- constructors `Call` and `Fn` to handle functions
|
||||
- binary and unary operators
|
||||
- handling when and if
|
||||
- error and tracing
|
||||
|
||||
`AirStatement` also has multiple constructors.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
for handling functions, `plutus primitives, along with
|
||||
An `AirStatement`
|
||||
|
||||
|
||||
|
||||
## Down to uplc
|
||||
|
||||
|
||||
|
||||
## Air
|
||||
|
||||
Aiken compiles aiken code to uplc via _air_:
|
||||
Aiken Intermediate Representation.
|
||||
|
||||
## Trace
|
||||
|
||||
Running `aiken build`...
|
||||
|
||||
The cli (See `aiken/src/cmd/mod.rs`) parses the command,
|
||||
finds the context and calls `Project::build` (`crates/aiken-project/src/lib.rs`),
|
||||
which in turn calls `Project::compile`.
|
||||
|
||||
#### `Project::compile`
|
||||
|
||||
1. Check dependencies are available _eg_ aiken stdlib.
|
||||
2. Read source files.
|
||||
1. Walk over `./lib` and `./validators` and push aiken modules onto `Project.sources`.
|
||||
3. Parse each source in sources:
|
||||
1. Generate a `ParsedModule` containing the `ast`, `docs`, _etc_.
|
||||
The `ast` here is an `UntypedModule`, which contains untyped definitions.
|
||||
4. Type check each parsed module.
|
||||
1. For each untyped module, create a `CheckedModule`.
|
||||
This includes typed definitions.
|
||||
5. `compile` forks into two depending on whether it's been called with `build` or `check`.
|
||||
6. From `CheckModules` construct a `CodeGenerator`
|
||||
7. Pass the generator to construct a new `Blueprints`.
|
||||
1. Blueprints finds validators from checked modules.
|
||||
2. From each it constructs a `Validator` with the constructor `Validator::from_checked_module` (which returns a vector of validators)
|
||||
1. Its here that the magic happens: The method `generator.generate(def)` is called,
|
||||
where `def` is the typed validator(s).
|
||||
This method outputs a `Program<Name>` which contains the UPLC.
|
||||
2. These are collected together.
|
||||
3. The rest is collecting and handling the errors and warnings and writing the blueprint.
|
||||
|
||||
|
||||
#### `CodeGenerator::generate`
|
||||
|
||||
1. Create a new `AirStack`.
|
||||
|
||||
|
||||
#### `AirStack`
|
||||
|
||||
Consists of:
|
||||
1. An Id
|
||||
2. A `Scope`
|
||||
3. A vector of `Air`
|
||||
|
||||
The Scope keeps track of ... [TODO]
|
||||
|
||||
#### Air
|
||||
|
||||
Air is a typed language... [TODO]
|
Loading…
Reference in New Issue