tracing aiken build: proofread

This commit is contained in:
waalge 2023-09-02 20:05:48 +00:00
parent e2db317b30
commit b340cfd2f0
1 changed files with 21 additions and 23 deletions

View File

@ -8,7 +8,7 @@ Aims:
The motivation for writing this came from a desire to add additional features to Aiken not yet available.
One such feature would evaluate an arbitrary function in Aiken callable from JavaScript.
This would help a lot with testing trying to align on and off-chain code.
This would help a lot with testing and when trying to align on and off-chain code.
Another more pipe dreamy, ad-hoc function extraction - from a span of code, generate a function.
A digression to answer _why would this be at all helpful?!_
@ -23,9 +23,9 @@ Possible solutions:
The problems are:
1. Requires relentless constructing and deconstructing across the function call.
And this is adds costs in Aiken.
This adds costs.
2. Becomes tedious aligning the definition and function call.
3. End up with very long validators which are hard to unit test.
3. Ends up with very long validators which are hard to unit test.
My current preferred way is to accept that validator functions are long.
Ad-hoc function extraction would allow for sections of code to be tested without needing to be factored out.
@ -35,18 +35,17 @@ To do either of these, we need to get to grips with the Aiken compilation pipeli
### This won't age well
Aiken is undergoing active development.
This post was started life with Aiken ~v1.14.
With Aiken v1.15, there were already reasonably significant changes to the compilation pipeline.
The word is that there aren't as big changes in the near future,
This post started life with Aiken ~v1.14.
Aiken v1.15 introduced reasonably significant changes to the compilation pipeline.
The word is that there aren't any more big changes in the near future,
but this article will undoubtedly begin to diverge from the current code-base even before publishing.
### Limitations of narrating code
Narrating code becomes a compromise between being honest and accurate, and being readable and digestible.
Following the command `aiken build` covers well in excess of 10,000 LoC.
The writing of this post ground slowly to a halt as it progressed deeper into the code
with the details seeming to increase in importance.
At some point I had to draw a line and resign to fact that some parts will remain black boxes for now.
The command `aiken build` covers well in excess of 10,000 LoC.
The writing of this post ground to a halt as it reached deeper into the code-base.
To redeem it, some (possibly large) sections remain black boxes.
## Aiken build
@ -67,7 +66,7 @@ At a high level we are trying to do something straightforward: reformulate Aiken
Some Aiken expressions are relatively easy to handle for example an Aiken `Int` goes to an `Int` in Uplc.
Some Aiken expressions require more involved handling, for example an Aiken `If... If Else... Else `
must have the branches "nested" in Uplc.
Aiken also have lots of nice-to-haves like pattern matching, modules, and generics.
Aiken has lots of nice-to-haves like pattern matching, modules, and generics;
Uplc has none of these.
### The Preamble
@ -114,9 +113,9 @@ Things become a bit intimidating at this point in terms of sheer lines of code:
`gen_uplc.rs` and three modules in `gen_uplc/` totals > 8500 LoC.
Aiken has its own _intermediate representation_ called `air` (as in Aiken Intermediate Representation).
These are common in compiled languages.
Intermediate representations are common in compiled languages.
`Air` is defined in `aiken-lang/src/gen_uplc/air.rs`.
Unsurprisingly, it looks little bit like a language between Aiken and plutus.
Unsurprisingly, it looks a little bit like a language between Aiken and plutus.
In fact, Aiken has another intermediate representation: `AirTree`.
This is constructed between the `TypedExpr` and `Vec<Air>` ie between parsed Aiken and air.
@ -134,12 +133,12 @@ It does so by
self.assignment >> self.expect_type_assign >> self.code_gen_functions.insert
```
and thus is creating a hashmap of all the functions that appear in the definition.
From the call to return of `assign` covers > 600 LoC so we'll leave this as otherwise a black box.
From the call to return of `assign` covers > 600 LoC so we'll leave this as a black box.
(`self.handle_each_clause` is also called with `mut` which in turn calls `self.build` for which `mut` it is needed.)
Validators in Aiken are boolean functions while in Uplc they are unit-valued (aka void-valued) functions.
Thus the air tree is wrapped such that `false` results in an error (`wrap_validator_condition`).
I don't know why there is a prevailing thought that boolean functions are preferable than functions
I don't know why there is a prevailing thought that boolean functions are preferable to functions
that error if anything is wrong - which is what validators are.
`check_validator_args` again extends the airtree from the previous step,
@ -186,7 +185,7 @@ As `AirTree` is for internal use only, the scope for potential problems is reaso
It seems likely this is to avoid similar-yet-different IRs between steps.
However, the trade off is that it partially obfuscates what is a valid state where.
What is hoisting? hoisting gives the airtree depth.
What is hoisting? Hoisting gives the airtree depth.
The motivation is that by the time we hit Uplc it is "generally better"
that
@ -194,7 +193,7 @@ that
- the definition appears as close to use as possible
Hoisting creates tree paths.
The final airtree to airtree step is`self.hoist_functions_to_validator` traverses the paths.
The final airtree to airtree step, `self.hoist_functions_to_validator`, traverses these paths.
There is a lot of mutating of self, making it quite hard to keep a handle on things.
In all this (several thousand?) LoC, it is essentially ascertaining in which node of the tree
to insert each function definition.
@ -220,7 +219,7 @@ It flattens the tree to a vec.
Next we go from `Vec<Air> -> Term<Name>`.
This step is a little more involved than the previous.
For one, this is executed in the context of the code generator.
Moreover, the code generator is treated mutable - ouch.
Moreover, the code generator is treated as mutable - ouch.
On further inspection we see that the only mutation is setting `self.needs_field_access = true`.
This flag informs the compiler that, if true, additional terms must be added in one of the final steps
@ -232,7 +231,7 @@ Some examples:
- `Air::Var` require 100 LoC to do case handling on different constructors.
- Lists in air have no immediate analogue in uplc
- builtins, as in built-in functions (standard shorthand), have to mediated
- builtins, as in built-in functions (standard shorthand), have to be mediated
with some combination of `force` and `delay` in order to behave as they should.
- user functions must be "uncurried", ie treated as a sequence of single argument functions,
and recursion must be handled
@ -240,9 +239,8 @@ and recursion must be handled
#### Cranking the Optimizer
There is a sequence of operations performed on the Uplc mapping `Term<Name> -> Term<Name>`.
These remove inconsequential parts of the logic which will appear.
These include:
There is a sequence of operations performed on the Uplc, mapping `Term<Name> -> Term<Name>`.
This removes inconsequential parts of the logic which have been generated, including:
- removing application of the identity function
- directly substituting where apply lambda is applied to a constant or builtin
@ -259,7 +257,7 @@ The generated program can now be serialized and included in the blueprint.
### Plutus Core Signposting
All this fuss is to get us to a point where we can write Uplc - and good Uplc at that.
Note that there's many ways to generate code and most of them are bad.
Note that there are many ways to generate code and most of them are bad.
The various design decisions and compilation steps make more sense
when we have a better understanding of the target language.