This commit is contained in:
waalge 2023-09-02 12:15:26 +00:00
parent 38e68b5316
commit e2db317b30
1 changed files with 34 additions and 34 deletions

View File

@ -1,16 +1,16 @@
Aims: Aims:
> Describe the pipeline and components getting from aiken to uplc. > Describe the pipeline and components getting from Aiken to Uplc.
## The Preface ## The Preface
### Motivations ### Motivations
The motivation for writing this came from a desire to add additional features to aiken not yet available. The motivation for writing this came from a desire to add additional features to Aiken not yet available.
One such feature would evaluate an arbitrary function in aiken callable from javascript. One such feature would evaluate an arbitrary function in Aiken callable from JavaScript.
This would help a lot with testing trying to align on and off-chain code. This would help a lot with testing trying to align on and off-chain code.
Another more pipe dreamy, adhoc function extraction - from a span of code, generate a function. Another more pipe dreamy, ad-hoc function extraction - from a span of code, generate a function.
A digression to answer _why would this be at all helpful?!_ A digression to answer _why would this be at all helpful?!_
Validator logic often needs a broad context throughout. Validator logic often needs a broad context throughout.
How then to best factor code? How then to best factor code?
@ -23,14 +23,14 @@ Possible solutions:
The problems are: The problems are:
1. Requires relentless constructing and deconstructing across the function call. 1. Requires relentless constructing and deconstructing across the function call.
And this is adds costs in aiken. And this is adds costs in Aiken.
2. Becomes tedious aligning the definition and function call. 2. Becomes tedious aligning the definition and function call.
3. End up with very long validators which are hard to unit test. 3. End up with very long validators which are hard to unit test.
My current preferred way is to accept that validator functions are long. My current preferred way is to accept that validator functions are long.
Adhoc function extraction would allow for sections of code to be tested without needing to be factored out. Ad-hoc function extraction would allow for sections of code to be tested without needing to be factored out.
To do either of these, we need to get to grips with the aiken compilation pipeline. To do either of these, we need to get to grips with the Aiken compilation pipeline.
### This won't age well ### This won't age well
@ -38,11 +38,11 @@ Aiken is undergoing active development.
This post was started life with Aiken ~v1.14. This post was started life with Aiken ~v1.14.
With Aiken v1.15, there were already reasonably significant changes to the compilation pipeline. With Aiken v1.15, there were already reasonably significant changes to the compilation pipeline.
The word is that there aren't as big changes in the near future, The word is that there aren't as big changes in the near future,
but this article will undoubtably begin to diverge from the current codebase even before publishing. but this article will undoubtedly begin to diverge from the current code-base even before publishing.
### Limitations of narating code ### Limitations of narrating code
Narating code becomes a compromise between being honest and accurate, and being readable and digestable. Narrating code becomes a compromise between being honest and accurate, and being readable and digestible.
Following the command `aiken build` covers well in excess of 10,000 LoC. Following the command `aiken build` covers well in excess of 10,000 LoC.
The writing of this post ground slowly to a halt as it progressed deeper into the code The writing of this post ground slowly to a halt as it progressed deeper into the code
with the details seeming to increase in importance. with the details seeming to increase in importance.
@ -63,10 +63,10 @@ Tracing `aiken build`, the pipeline is roughly:
``` ```
We'll pick our way through these steps We'll pick our way through these steps
At a high level we are trying to do something straightforward: reformulate aiken code as uplc. At a high level we are trying to do something straightforward: reformulate Aiken code as Uplc.
Some aiken expressions are relatively easy to handle for example an aiken `Int` goes to an `Int` in uplc. Some Aiken expressions are relatively easy to handle for example an Aiken `Int` goes to an `Int` in Uplc.
Some aiken expressions require more involved handling, for example an aiken `If... If Else... Else ` Some Aiken expressions require more involved handling, for example an Aiken `If... If Else... Else `
must have the branches "nested" in uplc. must have the branches "nested" in Uplc.
Aiken also have lots of nice-to-haves like pattern matching, modules, and generics. Aiken also have lots of nice-to-haves like pattern matching, modules, and generics.
Uplc has none of these. Uplc has none of these.
@ -80,7 +80,7 @@ which in turn calls `Project::compile`.
#### File crawl #### File crawl
The program looks for aiken files in both `./lib` and `./validator` subdirs. The program looks for Aiken files in both `./lib` and `./validator` sub-directories.
For each it walks over all contents (recursively) looking for `.ak` extensions. For each it walks over all contents (recursively) looking for `.ak` extensions.
It treats these two sets of files a little differently. It treats these two sets of files a little differently.
For example, only validator files can contain the special validator functions. For example, only validator files can contain the special validator functions.
@ -116,10 +116,10 @@ Things become a bit intimidating at this point in terms of sheer lines of code:
Aiken has its own _intermediate representation_ called `air` (as in Aiken Intermediate Representation). Aiken has its own _intermediate representation_ called `air` (as in Aiken Intermediate Representation).
These are common in compiled languages. These are common in compiled languages.
`Air` is defined in `aiken-lang/src/gen_uplc/air.rs`. `Air` is defined in `aiken-lang/src/gen_uplc/air.rs`.
Unsurprisingly, it looks little bit like a language between aiken and plutus. Unsurprisingly, it looks little bit like a language between Aiken and plutus.
In fact, Aiken has another intermediate representation: `AirTree`. In fact, Aiken has another intermediate representation: `AirTree`.
This is constructed between the `TypedExpr` and `Vec<Air>` ie between parsed aiken and air. This is constructed between the `TypedExpr` and `Vec<Air>` ie between parsed Aiken and air.
#### Climbing the AirTree #### Climbing the AirTree
@ -137,8 +137,8 @@ and thus is creating a hashmap of all the functions that appear in the definitio
From the call to return of `assign` covers > 600 LoC so we'll leave this as otherwise a black box. From the call to return of `assign` covers > 600 LoC so we'll leave this as otherwise a black box.
(`self.handle_each_clause` is also called with `mut` which in turn calls `self.build` for which `mut` it is needed.) (`self.handle_each_clause` is also called with `mut` which in turn calls `self.build` for which `mut` it is needed.)
Validators in aiken are boolean functions while in uplc they are unit-valued (aka void-valued) functions. Validators in Aiken are boolean functions while in Uplc they are unit-valued (aka void-valued) functions.
Thus the airtree is wrapped such that `false` results in an error (`wrap_validator_condition`). Thus the air tree is wrapped such that `false` results in an error (`wrap_validator_condition`).
I don't know why there is a prevailing thought that boolean functions are preferable than functions I don't know why there is a prevailing thought that boolean functions are preferable than functions
that error if anything is wrong - which is what validators are. that error if anything is wrong - which is what validators are.
@ -160,7 +160,7 @@ pub enum AirTree {
UnhoistedSequence(Vec<AirTree>), UnhoistedSequence(Vec<AirTree>),
} }
``` ```
Note that `AirStatement` and `AirExpression` are mutually recusive definitions with `AirTree`. Note that `AirStatement` and `AirExpression` are mutually recursive definitions with `AirTree`.
Otherwise, it would be unclear from first inspection how tree-like this really is. Otherwise, it would be unclear from first inspection how tree-like this really is.
`AirExpression` has multiple constructors. These include (non-exhaustive) `AirExpression` has multiple constructors. These include (non-exhaustive)
@ -176,7 +176,7 @@ Otherwise, it would be unclear from first inspection how tree-like this really i
- let assignments and named function definitions - let assignments and named function definitions
- handling expect assignments - handling expect assignments
- pattern matching - pattern matching
- unwrapping datastructures - unwrapping data structures
Note that `AirTree` has many methods that are partial functions, Note that `AirTree` has many methods that are partial functions,
as in there are possible states that are not considered legitimate as in there are possible states that are not considered legitimate
@ -184,20 +184,20 @@ at different points of its construction and use.
For example `hoist_over` will throw an error if called on an `Expression`. For example `hoist_over` will throw an error if called on an `Expression`.
As `AirTree` is for internal use only, the scope for potential problems is reasonably contained. As `AirTree` is for internal use only, the scope for potential problems is reasonably contained.
It seems likely this is to avoid similar-yet-different IRs between steps. It seems likely this is to avoid similar-yet-different IRs between steps.
However, the trade off is that it partially obsufucates what is a valid state where. However, the trade off is that it partially obfuscates what is a valid state where.
What is hoisting? hoisting gives the airtree depth. What is hoisting? hoisting gives the airtree depth.
The motivation is that by the time we hit uplc it is "generally better" The motivation is that by the time we hit Uplc it is "generally better"
that that
- function defintions appear once rather than being inlined multiple times - function definitions appear once rather than being inlined multiple times
- the definition appears as close to use as possible - the definition appears as close to use as possible
Hoisting creates tree paths. Hoisting creates tree paths.
The final airtree to airtree step is`self.hoist_functions_to_validator` traverses the paths. The final airtree to airtree step is`self.hoist_functions_to_validator` traverses the paths.
There is a lot of mutating of self, making it quite hard to keep a handle on things. There is a lot of mutating of self, making it quite hard to keep a handle on things.
In all this (several thousand?) LoC, it is essentially ascertaining in which node of the tree In all this (several thousand?) LoC, it is essentially ascertaining in which node of the tree
to insert each function definiton. to insert each function definition.
In a resource constrained environment like plutus, this effort is warranted. In a resource constrained environment like plutus, this effort is warranted.
At the same time this function deals with At the same time this function deals with
@ -205,22 +205,22 @@ At the same time this function deals with
- monomophisation - no more generics - monomophisation - no more generics
- erasing opaque types - erasing opaque types
Neither of which exist at the uplc level. Neither of which exist at the Uplc level.
#### Into Air #### Into Air
The `to_vec : AirTree -> Vec<Air>` is much easier to digest. The `to_vec : AirTree -> Vec<Air>` is much easier to digest.
For one, it is not evaluated in the context of the CodeGenerator, For one, it is not evaluated in the context of the code generator,
and two, there is no mutation of the airtree. and two, there is no mutation of the airtree.
The function recursively takes nodes of the tree and maps them to entries in a mutable vector. The function recursively takes nodes of the tree and maps them to entries in a mutable vector.
It flattens the tree to a vec. It flattens the tree to a vec.
### Down to uplc ### Down to Uplc
Next we go from `Vec<Air> -> Term<Name>`. Next we go from `Vec<Air> -> Term<Name>`.
This step is a little more involved than the previous. This step is a little more involved than the previous.
For one, this is executed in the context of the code generator. For one, this is executed in the context of the code generator.
Moreover, the code generatore is treated mutable - ouch. Moreover, the code generator is treated mutable - ouch.
On further inspection we see that the only mutation is setting `self.needs_field_access = true`. On further inspection we see that the only mutation is setting `self.needs_field_access = true`.
This flag informs the compiler that, if true, additional terms must be added in one of the final steps This flag informs the compiler that, if true, additional terms must be added in one of the final steps
@ -240,13 +240,13 @@ and recursion must be handled
#### Cranking the Optimizer #### Cranking the Optimizer
There is a sequence of operations perfromed on the uplc mapping `Term<Name> -> Term<Name>`. There is a sequence of operations performed on the Uplc mapping `Term<Name> -> Term<Name>`.
These remove inconsequential parts of the logic which will appear. These remove inconsequential parts of the logic which will appear.
These include: These include:
- removing application of the identity function - removing application of the identity function
- directly substituting where apply lambda is applied to a constant or builtin - directly substituting where apply lambda is applied to a constant or builtin
- inline or simplify where apply lambda is applied to a param that appears once or not at all - inline or simplify where apply lambda is applied to a parameter that appears once or not at all
Each of these optimizing methods has a its own relatively narrow focus, Each of these optimizing methods has a its own relatively narrow focus,
and so although there is a fair number of LoC, it's reasonably straightforward to follow. and so although there is a fair number of LoC, it's reasonably straightforward to follow.
@ -258,14 +258,14 @@ The generated program can now be serialized and included in the blueprint.
### Plutus Core Signposting ### Plutus Core Signposting
All this fuss is to get us to a point where we can write uplc - and good uplc at that. All this fuss is to get us to a point where we can write Uplc - and good Uplc at that.
Note that there's many ways to generate code and most of them are bad. Note that there's many ways to generate code and most of them are bad.
The various design decisions and compilation steps make more sense The various design decisions and compilation steps make more sense
when we have a better understanding of the target language. when we have a better understanding of the target language.
Uplc is a lambda calculus. Uplc is a lambda calculus.
For a comprehensive definition on uplc checkout the specification found For a comprehensive definition on Uplc checkout the specification found
[here](https://github.com/input-output-hk/plutus/#specifications-and-design) from the plutus github repo. [here](https://github.com/input-output-hk/plutus/#specifications-and-design) from the plutus GitHub repo.
(I imagine this link will be maintained longer than the current actual link.) (I imagine this link will be maintained longer than the current actual link.)
If you're not at all familiar with lambda calculus I recommend If you're not at all familiar with lambda calculus I recommend
[an unpacking](https://crypto.stanford.edu/~blynn/lambda/) by Ben Lynn. [an unpacking](https://crypto.stanford.edu/~blynn/lambda/) by Ben Lynn.