From 38e68b53163f5d0edab9bcb21a9c839e580f4be3 Mon Sep 17 00:00:00 2001 From: waalge Date: Fri, 1 Sep 2023 13:30:49 +0000 Subject: [PATCH] edits to tracing aiken --- content/drafts/tracing-aiken-build.md | 275 ++++++++++++++++++++++++++ content/drafts/unpicking-aiken-air.md | 228 --------------------- 2 files changed, 275 insertions(+), 228 deletions(-) create mode 100644 content/drafts/tracing-aiken-build.md delete mode 100644 content/drafts/unpicking-aiken-air.md diff --git a/content/drafts/tracing-aiken-build.md b/content/drafts/tracing-aiken-build.md new file mode 100644 index 0000000..a500720 --- /dev/null +++ b/content/drafts/tracing-aiken-build.md @@ -0,0 +1,275 @@ +Aims: + +> Describe the pipeline and components getting from aiken to uplc. + +## The Preface + +### Motivations + +The motivation for writing this came from a desire to add additional features to aiken not yet available. +One such feature would evaluate an arbitrary function in aiken callable from javascript. +This would help a lot with testing trying to align on and off-chain code. + +Another more pipe dreamy, adhoc function extraction - from a span of code, generate a function. +A digression to answer _why would this be at all helpful?!_ +Validator logic often needs a broad context throughout. +How then to best factor code? +Possible solutions: + +1. Introduce types / structs +2. Have functions with lots of arguments +3. Don't + +The problems are: + +1. Requires relentless constructing and deconstructing across the function call. +And this is adds costs in aiken. +2. Becomes tedious aligning the definition and function call. +3. End up with very long validators which are hard to unit test. + +My current preferred way is to accept that validator functions are long. +Adhoc function extraction would allow for sections of code to be tested without needing to be factored out. + +To do either of these, we need to get to grips with the aiken compilation pipeline. + +### This won't age well + +Aiken is undergoing active development. +This post was started life with Aiken ~v1.14. +With Aiken v1.15, there were already reasonably significant changes to the compilation pipeline. +The word is that there aren't as big changes in the near future, +but this article will undoubtably begin to diverge from the current codebase even before publishing. + +### Limitations of narating code + +Narating code becomes a compromise between being honest and accurate, and being readable and digestable. +Following the command `aiken build` covers well in excess of 10,000 LoC. +The writing of this post ground slowly to a halt as it progressed deeper into the code +with the details seeming to increase in importance. +At some point I had to draw a line and resign to fact that some parts will remain black boxes for now. + +## Aiken build + +Tracing `aiken build`, the pipeline is roughly: +``` + . -> Project::read_source_files -> + Vec -> Project::parse_sources -> + ParsedModules -> Project::type_check -> + CheckedModules -> CodeGenerator::build -> + AirTree -> AirTree::to_vec -> + Vec -> CodeGenerator::uplc_code_gen -> + Program / Term -> serialize -> + . +``` +We'll pick our way through these steps + +At a high level we are trying to do something straightforward: reformulate aiken code as uplc. +Some aiken expressions are relatively easy to handle for example an aiken `Int` goes to an `Int` in uplc. +Some aiken expressions require more involved handling, for example an aiken `If... If Else... Else ` +must have the branches "nested" in uplc. +Aiken also have lots of nice-to-haves like pattern matching, modules, and generics. +Uplc has none of these. + +### The Preamble + +#### Cli handling + +The cli enters at `aiken/src/cmd/mod.rs` which parses the command. +With some establishing of context, the program enters `Project::build` (`crates/aiken-project/src/lib.rs`), +which in turn calls `Project::compile`. + +#### File crawl + +The program looks for aiken files in both `./lib` and `./validator` subdirs. +For each it walks over all contents (recursively) looking for `.ak` extensions. +It treats these two sets of files a little differently. +For example, only validator files can contain the special validator functions. + +#### Parse and Type check + +`Project::parse_sources` parses the module source code. +The heavy lifting is done by `aiken_lang::parser::module`, which is evaluated on each file. +It produces a `Module` containing a list of parsed definitions of the file: functions, types _etc_, +together with metadata like docstrings and the file path. + +`Project::type_check` inspects the parsed modules and, as the name implies, checks the types. +It flags type level warnings and errors and constructs a hash map of `CheckedModule`s. + +#### Code generator + +The code generator `CodeGenerator` (`aiken-lang/src/gen_uplc.rs`) is given +the definitions found from the previous step, +together with the plutus builtins. +It has additional fields for things like debugging. + +This is handed over to a `Blueprint` (`aiken-project/src/blueprint/mod.rs`). +The blueprint does little more than find the validators on which to run the code gen. +The heavy lifting is done by `CodeGenerator::generate`. + +We are now ready to take the source code and create plutus. + +### In the air + +Things become a bit intimidating at this point in terms of sheer lines of code: +`gen_uplc.rs` and three modules in `gen_uplc/` totals > 8500 LoC. + +Aiken has its own _intermediate representation_ called `air` (as in Aiken Intermediate Representation). +These are common in compiled languages. +`Air` is defined in `aiken-lang/src/gen_uplc/air.rs`. +Unsurprisingly, it looks little bit like a language between aiken and plutus. + +In fact, Aiken has another intermediate representation: `AirTree`. +This is constructed between the `TypedExpr` and `Vec` ie between parsed aiken and air. + +#### Climbing the AirTree + +Within `CodeGenerator::generate`, `CodeGenerator::build` is called on the function body. +This takes a `TypedExpr` and constructs and returns an `AirTree`. +The construction is recursive as it traverses the recursive `TypedExpr` data structure. +More on what an airtree is and its construction below. +At the same time `self` is treated as `mut`, so we need to keep an eye on this too. +The method which is called and uses this mutability of self is `self.assignment`. +It does so by +```sample + self.assignment >> self.expect_type_assign >> self.code_gen_functions.insert +``` +and thus is creating a hashmap of all the functions that appear in the definition. +From the call to return of `assign` covers > 600 LoC so we'll leave this as otherwise a black box. +(`self.handle_each_clause` is also called with `mut` which in turn calls `self.build` for which `mut` it is needed.) + +Validators in aiken are boolean functions while in uplc they are unit-valued (aka void-valued) functions. +Thus the airtree is wrapped such that `false` results in an error (`wrap_validator_condition`). +I don't know why there is a prevailing thought that boolean functions are preferable than functions +that error if anything is wrong - which is what validators are. + +`check_validator_args` again extends the airtree from the previous step, +and again calls `self.assignment` mutating self. +Something interesting is happening here. +Script context is the final argument of a validator - for any script purpose. +`check_validator_args` treats the script context like it is an unused argument. +The importance of this is not immediate, and I've still yet to appreciate why this happens. + +Let's take a look at what AirTree actually is +```rust +pub enum AirTree { + Statement { + statement: AirStatement, + hoisted_over: Option>, + }, + Expression(AirExpression), + UnhoistedSequence(Vec), +} +``` +Note that `AirStatement` and `AirExpression` are mutually recusive definitions with `AirTree`. +Otherwise, it would be unclear from first inspection how tree-like this really is. + +`AirExpression` has multiple constructors. These include (non-exhaustive) + +- air primitives (including all the ones that appear in plutus) +- constructors `Call` and `Fn` to handle anonymous functions +- binary and unary operators +- handling when and if +- handling error and tracing + +`AirStatement` also has multiple constructors. These include + +- let assignments and named function definitions +- handling expect assignments +- pattern matching +- unwrapping datastructures + +Note that `AirTree` has many methods that are partial functions, +as in there are possible states that are not considered legitimate +at different points of its construction and use. +For example `hoist_over` will throw an error if called on an `Expression`. +As `AirTree` is for internal use only, the scope for potential problems is reasonably contained. +It seems likely this is to avoid similar-yet-different IRs between steps. +However, the trade off is that it partially obsufucates what is a valid state where. + +What is hoisting? hoisting gives the airtree depth. +The motivation is that by the time we hit uplc it is "generally better" +that + +- function defintions appear once rather than being inlined multiple times +- the definition appears as close to use as possible + +Hoisting creates tree paths. +The final airtree to airtree step is`self.hoist_functions_to_validator` traverses the paths. +There is a lot of mutating of self, making it quite hard to keep a handle on things. +In all this (several thousand?) LoC, it is essentially ascertaining in which node of the tree +to insert each function definiton. +In a resource constrained environment like plutus, this effort is warranted. + +At the same time this function deals with + +- monomophisation - no more generics +- erasing opaque types + +Neither of which exist at the uplc level. + +#### Into Air + +The `to_vec : AirTree -> Vec` is much easier to digest. +For one, it is not evaluated in the context of the CodeGenerator, +and two, there is no mutation of the airtree. +The function recursively takes nodes of the tree and maps them to entries in a mutable vector. +It flattens the tree to a vec. + +### Down to uplc + +Next we go from `Vec -> Term`. +This step is a little more involved than the previous. +For one, this is executed in the context of the code generator. +Moreover, the code generatore is treated mutable - ouch. + +On further inspection we see that the only mutation is setting `self.needs_field_access = true`. +This flag informs the compiler that, if true, additional terms must be added in one of the final steps +(see `CodeGenerator::finalize`). + +As noted above, some of the mappings from air to terms are immediate like `Air::Bool -> Term::bool`. +Others are less so. +Some examples: + +- `Air::Var` require 100 LoC to do case handling on different constructors. +- Lists in air have no immediate analogue in uplc +- builtins, as in built-in functions (standard shorthand), have to mediated +with some combination of `force` and `delay` in order to behave as they should. +- user functions must be "uncurried", ie treated as a sequence of single argument functions, +and recursion must be handled +- Do some magic in order to efficiently allow "record updates". + +#### Cranking the Optimizer + +There is a sequence of operations perfromed on the uplc mapping `Term -> Term`. +These remove inconsequential parts of the logic which will appear. +These include: + +- removing application of the identity function +- directly substituting where apply lambda is applied to a constant or builtin +- inline or simplify where apply lambda is applied to a param that appears once or not at all + +Each of these optimizing methods has a its own relatively narrow focus, +and so although there is a fair number of LoC, it's reasonably straightforward to follow. +Some are applied multiple times. + +### The End + +The generated program can now be serialized and included in the blueprint. + +### Plutus Core Signposting + +All this fuss is to get us to a point where we can write uplc - and good uplc at that. +Note that there's many ways to generate code and most of them are bad. +The various design decisions and compilation steps make more sense +when we have a better understanding of the target language. + +Uplc is a lambda calculus. +For a comprehensive definition on uplc checkout the specification found +[here](https://github.com/input-output-hk/plutus/#specifications-and-design) from the plutus github repo. +(I imagine this link will be maintained longer than the current actual link.) +If you're not at all familiar with lambda calculus I recommend +[an unpacking](https://crypto.stanford.edu/~blynn/lambda/) by Ben Lynn. + +### What next? + +I think it would be helpful to have some examples... Watch this space. \ No newline at end of file diff --git a/content/drafts/unpicking-aiken-air.md b/content/drafts/unpicking-aiken-air.md deleted file mode 100644 index 6c23e56..0000000 --- a/content/drafts/unpicking-aiken-air.md +++ /dev/null @@ -1,228 +0,0 @@ -Aims: - -- Describe the pipeline, and components getting from aiken to uplc. - -## Preface - -Aiken is undergoing active development. -This post was started Aiken ~v1.14. -With Aiken v1.15, there were already reasonably significant changes to the compilation pipeline. -The word is that there aren't as big changes in the near future, but -this article will undoubtably begin to diverge from the current codebase even before publishing. - -## Aiken build - -Tracing `aiken build`, the pipeline is roughly something like: -``` - . -> Project::read_source_files -> - Vec -> Project::parse_sources -> - ParsedModules -> Project::type_check -> - CheckedModules -> CodeGenerator::build -> - AirTree -> AirTree::to_vec -> - Vec -> CodeGenerator::uplc_code_gen -> - Program / Term -> serialize -> - . -``` -We'll pick our way through these steps - -At a high level we are trying to do something straightforward: reformulate aiken code as uplc. -Some aiken expressions are relatively easy to handle for example an aiken `Int` goes to an `Int` in uplc. -Some aiken expressions require more involved handling, for example an aiken `If... If Else... Else ` -must have the branches "nested" in uplc. - -### The Preamble - -#### cli handling - -The cli enters at `aiken/src/cmd/mod.rs` which parses the command. -With some establishing of context, the program enters `Project::build` (`crates/aiken-project/src/lib.rs`), -which in turn calls `Project::compile`. - -#### File crawl - -The program looks for aiken files in both `./lib` and `./validator` subdirs. -For each it walks over all contents (recursively) looking for `.ak` extensions. -It treats these two sets of files a little differently. -Only validator files can contain the special validator functions. - -#### Parse and Type check - -`Project::parse_sources` parses the module source code. -The heavy lifting is done by `aiken_lang::parser::module`, which is evaluated on each file. -It produces a `Module` containing a list of parsed definitions of the file: functions, types _etc_, -together with "metadata" like docstrings and the file path. - -`Project::type_check` inspects the parsed modules and, as the name implies, checks the types. -It flags type level warnings and errors. -It constructs a hash map of `CheckedModule`s. - -#### Code generator - -The code generator `CodeGenerator` (`aiken-lang/src/gen_uplc.rs`) is given -the definitions found from the previous step, -together with the plutus builtins. -It has additional fields for things like debugging. - -This is handed over to a `Blueprint` (`aiken-project/src/blueprint/mod.rs`). -A blueprint does little more than find the validators on which to run the code gen. -The heavy lifting is done by `CodeGenerator::generate`. - -We are now ready to take the source code and create plutus. - -### Up in the air - -Things become a bit intimidating at this point in terms of sheer lines of code: -`gen_uplc.rs` and three modules in `gen_uplc/` totals > 8500 LoC. - -Aiken has its own _intermediate representation_ called `air` (as in Aiken Intermediate Representation). -These are common in compiled languages. -`Air` is defined in `aiken-lang/src/gen_uplc/air.rs`. -Unsurprisingly, it looks little bit like a language between aiken and plutus. - -In fact, Aiken has another intermediate representation: `AirTree`. -This is constructed between the `TypedExpr` and `Vec` ie between parsed aiken and air. - -#### AirTree - -Within `CodeGenerator::generate`, `CodeGenerator::build` is called on the function body. -This constructs and returns an `AirTree`. -More on what an airtree is and its construction below. -At the same time `self` is treated as `mut`, so we need to keep an eye on this too. -The method which is called and uses this mutability of self is `self.assignment`. -It does so by -```sample - self.assignment >> self.expect_type_assign >> self.code_gen_functions.insert -``` -and thus is creating a hashmap of all the functions that appear in the definition. -(`self.handle_each_clause` is also called with `mut` which in turn calls `self.build` for which `mut` it is needed. -`self.clause_pattern` is called with `mut` but it isn't used.) - -###### Codegen assignment - -~200 LoC - -###### Codegen expect type assign - -~400 LoC - -###### ... Back to build - -Validators in aiken are boolean functions while in uplc they are unit-valued (aka void-valued) functions. -Thus the airtree is wrapped such that `false` results in an error (`wrap_validator_condition`). -(Ed: I don't know why there is a prevailing thought that boolean functions are preferable than functions -that simply error if anything is wrong.) - -`check_validator_args` again extends the airtree from the previous step, -and again calls `self.assignment` mutating self. -Something interesting is happening here. -Script context is the final argument of a validator - for any script purpose. -`check_validator_args` treats the script context like it is an unused argument. -We'll circle back to how this works later on. - -Next we encounter -```rust - AirTree::no_op().hoist_over(validator_args_tree); -``` -Its not very apparent why we need to do this. Let's look ahead and consider this later. - -The final airtree to step(s) are in `self.hoist_functions_to_validator`. -TODO: What happens here?! - - - - -Note that `AirTree` and its methods aren't fully typesafe. -For example `hoist_over` will throw an error if called on an `Expression`. -As `AirTree` is for internal use only, the scope for potential problems is reasonably contained. - - - - -The AirTree has the following definition -```rust -pub enum AirTree { - Statement { - statement: AirStatement, - hoisted_over: Option>, - }, - Expression(AirExpression), - UnhoistedSequence(Vec), -} -``` -We can see it has a tree-like structure, as the name suggests. - -`AirExpression` has multiple constructors. These include (non-exhaustive) -- air primitives (including all the ones that appear in plutus) -- constructors `Call` and `Fn` to handle functions -- binary and unary operators -- handling when and if -- error and tracing - -`AirStatement` also has multiple constructors. - - - - - -for handling functions, `plutus primitives, along with -An `AirStatement` - - - -## Down to uplc - - - -## Air - -Aiken compiles aiken code to uplc via _air_: -Aiken Intermediate Representation. - -## Trace - -Running `aiken build`... - -The cli (See `aiken/src/cmd/mod.rs`) parses the command, -finds the context and calls `Project::build` (`crates/aiken-project/src/lib.rs`), -which in turn calls `Project::compile`. - -#### `Project::compile` - -1. Check dependencies are available _eg_ aiken stdlib. -2. Read source files. - 1. Walk over `./lib` and `./validators` and push aiken modules onto `Project.sources`. -3. Parse each source in sources: - 1. Generate a `ParsedModule` containing the `ast`, `docs`, _etc_. - The `ast` here is an `UntypedModule`, which contains untyped definitions. -4. Type check each parsed module. - 1. For each untyped module, create a `CheckedModule`. - This includes typed definitions. -5. `compile` forks into two depending on whether it's been called with `build` or `check`. -6. From `CheckModules` construct a `CodeGenerator` -7. Pass the generator to construct a new `Blueprints`. - 1. Blueprints finds validators from checked modules. - 2. From each it constructs a `Validator` with the constructor `Validator::from_checked_module` (which returns a vector of validators) - 1. Its here that the magic happens: The method `generator.generate(def)` is called, - where `def` is the typed validator(s). - This method outputs a `Program` which contains the UPLC. - 2. These are collected together. - 3. The rest is collecting and handling the errors and warnings and writing the blueprint. - - -#### `CodeGenerator::generate` - -1. Create a new `AirStack`. - - -#### `AirStack` - -Consists of: -1. An Id -2. A `Scope` -3. A vector of `Air` - -The Scope keeps track of ... [TODO] - -#### Air - -Air is a typed language... [TODO]