- We now consistently desugar an expect in the last position as
`Void`. Regardless of the pattern. Desugaring to a boolean value is
deemed too confusing.
- This commit also removes the desugaring for let-binding. It's only
ever allowed for _expect_ which then behaves like a side effect.
- We also now allow tests to return either `Bool` or `Void`. A test
that returns `Void` is treated the same as a test returning `True`.
This is debatable, but I would argue that it's been sufficiently
annoying for people and such a low-hanging fruit that we ought to do
something about it.
The strategy here is simple: when we find a sequence of expression
that ends with an assignment (let or expect), we can simply desugar it
into two expressions: the assignment followed by either `Void` or a
boolean.
The latter is used when the assignment pattern is itself a boolean;
the next boolean becomes the expected value. The former, `Void`, is
used for everything else. So said differently, any assignment
implicitly _returns Void_, except for boolean which return the actual
patterned bool.
<table>
<thead><tr><th>expression</th><th>desugar into</th></tr></thead>
<tbody>
<tr>
<td>
```aiken
fn expect_bool(data: Data) -> Void {
expect _: Bool = data
}
```
</td>
<td>
```aiken
fn expect_bool(data: Data) -> Void {
expect _: Bool = data
Void
}
```
</td>
</tr>
<tr>
<td>
```aiken
fn weird_maths() -> Bool {
expect 1 == 2
}
```
</td>
<td>
```aiken
fn weird_maths() -> Bool {
expect True = 1 == 2
True
}
```
</td>
</tr>
</tbody>
</table>
This is useful when splitting module for dependencies, yet without the desire to expose internal constructors and types. This merely skips the documentation generation; but doesn't prevent the hidden module from being accessible.
The idea is pretty simple, we'll just look for lines starting with Markdown heading sections, and render them in the documentation. They appear both in the sidebar, and within the generated docs themselves in between functions. This, coupled with the order preservation of the declaration in a module should make the generated docs significantly more elegant to organize and present.
The rationale is to let them in the order they are defined, so that
library authors have some freedom in how they present information. On
top of that, we'll also now parse specifi comments as section headers
that will be inserted in the sidebar when present.
As well as fixing a couple of other issues thanks to conformance
tests. Some functions like multiply_integer or verify_ed25519_signature
have also slightly changed their costing function.
There were some odd discrepancy for `integerToByteString` on the mem
side. Either 1 or about 1000 mem units off; which I couldn't quite
figure out. Yet, it proves useful to validate builtin at large and
ensure we have a valid cost model for v3.
This covers every proposal procedures but protocol parameters, this
one is yet to be done. It spans over 30+ fields, and felt like it is a
big enough piece to tackle it on its own.
Alongside a bunch of other stuff from the coverage list. In
particular, the mint transaction contains:
- reference inputs
- multiple outputs, with assets, and type-0, type-1 and type-6
addresses.
- an output with a datum hash
- an output with an inline script
- carries an extra datum witness, preimage of the embedded hash
- mint, with 2 policies purposely ordered wrongly, with 1 and 2
assets purposely ordered wrong. One of the mint is actually a
burn (i.e. negative quantity)
This is intense, as we still want to preserve the serializer for V1 &
V2, and I've tried as much as possible to avoid polluting the
application layer with many enum types such as:
```
pub enum TxOut {
V1(TransactionOutput),
V2(TransactionOutput),
V3(TransactionOutput),
}
```
Those types make working with the script context cumbersome, and are
only truly required to provide different serialisation strategies. So
instead, we keep one top-level `TxInfo V1/V2/V3` type, and we ensure
to pass serialization strategies as type wrappers.
This way, the strategy propagates through the structure up until it's
eliminated when it reaches the relevant types.
All-in-all, this strikes a correct balance between maintainability and
repetition; and it makes it possible to define _different but mostly
identical_ encoders for the various versions.
With it, I've been able to successfully encode a V3 script context and
match it against one produced using the Haskell libraries. More to
come.
Let's consider the following case:
```
type Var =
Integer
type Vars =
List<Var>
```
This incorrectly reports an infinite cycle; due to the inability to
properly type-check `Var` which is also a dependent var of `Vars`. Yet
the real issue here being that `Integer` is an unknown type.
This commit also upgrades miette to 7.2.0, so that we can also display
a better error output when the problem is actually a cycle.
The point of those tests is to ensure that blueprints are generated
properly, irrespective of the generated code. It is annoying to
constantly get those test failing every time we introduce an
optimization or something that would slightly change the generated
UPLC.
It is impossible to serialize/deserialize a Data with a negative
constructor. So the only way this can happen is by programmatically
construct a value using builtin constr_data.
While possible, it is entirely at the responsibility of the
programmer, but not malleable from an attacker who can only provide
values as 'Data' (and thus, must be decoded like others).
Cloning a 'Term' is potentially dangerous, so we don't want this to
happen by mistake. So instead, we pass in var names and turn them into
terms when necessary.
While the ledger doesn't allow deserializing negative constr value,
they are still possible at the machine level. So, we better make sure
that we don't make assumptions regarding this.
This isn't sufficient however, as the 'assignment' helper handling
code generation doesn't perform any check when patterns are vars. This
is curious, and need to be investigated further.
Let's consider the following case:
```
type Var =
Integer
type Vars =
List<Var>
```
This incorrectly reports an infinite cycle; due to the inability to
properly type-check `Var` which is also a dependent var of `Vars`. Yet
the real issue here being that `Integer` is an unknown type.
This commit also upgrades miette to 7.2.0, so that we can also display
a better error output when the problem is actually a cycle.
The spirit here is to make it easier to discover this syntax. People
have different intuition about it and the single pipe may not be the
most obvious one.
It is however the recommended syntax, and the formatter will rewrite
any of the other to it.
The syntax is as follows:
{ "bytes" = "...", "encoding" = "<encoding>" }
The following encoding are accepted:
"utf8", "utf-8", "hex", "base16"
Note: the duplicates are only there to make it easier for people to
discover them by accident. When "hex" (resp. "base16") is specified,
the bytes string will be decoded and must be a valid hex string.
This is currently extremely limited as it only supports (UTF-8)
bytearrays and integers. We should seek to at least support hex bytes
sequences, as well as bools, lists and possibly options.
For the latter, we the rework on constant outlined in #992 is
necessary.
This is less confusing that getting an 'UnknownModule' error reporting
even a different module name than the one actually being important
('env').
Also, this commit fixes a few errors found in the type-checker
when reporting 'UnknownModule' errors. About half the time, we would
actually attached _imported modules_ instead of _importable modules_
to the error, making the neighboring suggestion quite worse (nay
useless).
We figure out dependencies by looking at 'use' definition in parsed
modules. However, in the case of environment modules, we must consider
all of them when seeing "use env". Without that, the env modules are
simply compiled in parallel and may not yet have been compiled when
they are needed as actual dependencies.
We simply provide a flag with a free-form output which acts as
the module to lookup in the 'env' folder. The strategy is to replace
the environment module name on-the-fly when a user tries to import
'env'.
If the environment isn't found, an 'UnknownModule' error is raised
(which I will slightly adjust in a following commits to something more
related to environment)
There are few important consequences to this design which may not seem
immediately obvious:
1. We parse and type-check every env modules, even if they aren't
used. This ensures that code doesn't break with a compilation error
simply because people forgot to type-check a given env.
Note that compilation could still fail because the env module
itself could provide an invalid API. So it only prevents each
modules to be independently wrong when taken in isolation.
2. Technically, this also means that one can import env modules in
other env modules by their names. I don't know if it's a good or
bad idea at this point but it doesn't really do any wrong;
dependencies and cycles are handlded all-the-same.
Using 'pallas' as a dependency brings utxo-rpc other annoying dependencies such as _tokyo_. This not only makes the overall build longer, but it also prevents it to even work when targetting wasm.
- Doesn't allow pattern-matching on G1/G2 elements and strings,
because the use cases for those is unclear and it adds complexity to
the feature.
- We still _parse_ patterns on G1/G2 elements and strings, but emit an
error in those cases.
- The syntax is the same as for bytearray literals (i.e. supports hex,
utf-8 strings or plain arrays of bytes).
There are currently two zero-arg builtins:
- mkNilData
- mkNilPairData
And while they have strictly speaking no arguments, the VM still
requires that they are called with an extra unit argument applied.
While this builtin is readily available through the Aiken syntax
`[head, ..tail]`, there's no reason to not support its builtin form
even though we may not encourage its usage. For completeness and to
avoid bad surprises, it is now supported.
Fixes#964.
The original goal for this commit was to allow casting from Data on
patterns without annotation. For example, given some custom type
'OrderDatum':
```
expect OrderDatum { requested_handle, destination, .. }: OrderDatum = datum
```
would work fine, but:
```
expect OrderDatum { requested_handle, destination, .. } = datum
```
Yet, the annotation feels unnecessary at this point because type can
be inferred from the pattern itself. So this commit allows, whenever
possible (ie when the pattern is neither a discard nor a var), to
infer the type from a pattern.
Along the way, I also found a couple of weird behaviours surrounding
this kind of assignments, in particular in combination with let. I'll
highlight those in the next PR (#979).
We've never been using those 'expected' tokens captured during
parsing, which is lame because they contain useful information!
This is much better than merely showing our infamous
"Try removing it!"
- Trace-if-false are now completely discarded in compact mode.
- Only the label (i.e. first trace argument) is preserved.
- When compiling with tracing _compact_, the first label MUST unify to
a string. This shouldn't be an issue generally speaking and would
enforce that traces follow the pattern
```
label: arg_0[, arg_1, ..., arg_n]
```
Note that what isn't obvious with these changes is that we now support
what the "emit" keyword was trying to achieve; as we compile now with
user-defined traces only, and in compact mode to only keep event
labels in the final contract; while allowing larger payloads with
verbose tracing.
Actually, this has been a bug for a long time it seems. Calling any
prelude functions using a qualified import would result in a codegen
crash. Whoopsie.
This is now fixed as shown by the regression test.
This is not fully satisfactory as it pollutes a bit the prelude. Ideally, those functions should only be visible
and usable by the underlying trace code. But for now, we'll just go with it.
This commit introduces a new feature into
the parser, typechecker, and formatter.
The work for code gen will be in the next commit.
I was able to leverage some existing infrastructure
by making using of `AssignmentPattern`. A new field
`is` was introduced into `IfBranch`. This field holds
a generic `Option<Is>` meaning a new generic has to be
introduced into `IfBranch`. When used in `UntypedExpr`,
`IfBranch` must use `AssignmentPattern`. When used in
`TypedExpr`, `IfBranch` must use `TypedPattern`.
The parser was updated such that we can support this
kind of psuedo grammar:
`if <expr:condition> [is [<pattern>: ]<annotation>]`
This can be read as, when parsing an `if` expression,
always expect an expression after the keyword `if`. And then
optionally there may be this `is` stuff, and within that you
may optionally expect a pattern followed by a colon. We will
always expect an annotation.
This first expression is still saved as the field
`condition` in `IfBranch`. If `pattern` is not there
AND `expr:condition` is `UntypedExpr::Var` we can set
the pattern to be `Pattern::Var` with the same name. From
there shadowing should allow this syntax sugar to feel
kinda magical within the `IfBranch` block that follow.
The typechecker doesn't need to be aware of the sugar
described above. The typechecker looks at `branch.is`
and if it's `Some(is)` then it'll use `infer_assignment`
for some help. Because of the way that `is` can inject
variables into the scope of the branch's block and since
it's basically just like how `expect` works minus the error
we get to re-use that helper method.
It's important to note that in the typechecker, if `is`
is `Some(_)` then we do not enforce that `condition` is
of type `Bool`. This is because the bool itself will be
whether or not the `is` itself holds true given a PlutusData
payload.
When `is` is None, we do exactly what was being done
previously so that plain `if` expressions remain unaffected
with no semantic changes.
The formatter had to be made aware of the new changes with
some simple changes that need no further explanation.
This is mainly a syntactic trick/sugar, but it's been pretty annoying
to me for a while that we can't simply pattern-match/destructure
single-variant constructors directly from the args list. A classic
example is when writing property tests:
```ak
test foo(params via both(bytearray(), int())) {
let (bytes, ix) = params
...
}
```
Now can be replaced simply with:
```
test foo((bytes, ix) via both(bytearray(), int())) {
...
}
```
If feels natural, especially coming from the JavaScript, Haskell or
Rust worlds and is mostly convenient. Behind the scene, the compiler
does nothing more than re-writing the AST as the first form, with
pre-generated arg names. Then, we fully rely on the existing
type-checking capabilities and thus, works in a seamless way as if we
were just pattern matching inline.
There's no reasons for this to be a property of only ArgName::Named to begin with. And now, with the extra indirection introduced for arg_name, it may leads to subtle issues when patterns args are used in validators.
I slightly altered the way we parse import definitions to ensure we
merge imports from the same modules (that aren't aliased) together.
This prevents an annoying warning with duplicated import lines and
makes it just more convenient overall.
As a trade-off, we can no longer interleave import definitions with
other definitions. This should be a minor setback only since the
formatter was already ensuring that all import definitions would be
grouped at the top.
---
Note that, I originally attempted to implement this in the formatter
instead of the parser. As it felt more appropriate there. However, the
formatter operates on (unmutable) borrowed definitions, which makes it
annoyingly hard to perform any AST manipulations. The `Document`
returns by the format carries a lifetime that prevents the creation of
intermediate local values.
So instead, slightly tweaking the parser felt like the right thing to
do.
While we agree on the idea of having some ways of emitting events, the
design hasn't been completely fleshed out and it is unclear whether
events should have a well-defined format independent of the framework
/ compiler and what this format should be.
So we need more time discussing and agreeing about what use case we
are actually trying to solve with that.
Irrespective of that, some cleanup was also needed on the UPLC side
anyway since the PR introduced a lot of needless duplications.
This is the best we can do for this without
rearchitecting when we rewrite backpassing to
plain ol' assignments. In this case, if we see
a var and there is no annotation (thus probably not a cast),
then it's safe to rewrite to a `let` instead of an `expect`.
This way, we don't get a warning that is **unfixable**.
We are not trying to solve every little warning edge
case with this fix. We simply just can't allow there
to be a warning that the user can't make go away through
some means. All other edge cases like pattern matching on
a single contructor type with expect warnings can be fixed
via other means.
This is crucial as some checks regarding variable usages depends on
warnings; so we may accidentally remove variables from the AST as a
consequence of backtracking for deep inferrence.
The current inferrence system walks expressions from "top to bottom".
Starting from definitions higher in the source file, and down. When a
call is encountered, we use the information known for the callee
definition we have at the moment it is inferred.
This causes interesting issues in the case where the callee doesn't
have annotations and in only partially known. For example:
```
pub fn list(fuzzer: Option<a>) -> Option<List<a>> {
inner(fuzzer, [])
}
fn inner(fuzzer, xs) -> Option<List<b>> {
when fuzzer is {
None -> Some(xs)
Some(x) -> Some([x, ..xs])
}
}
```
In this small program, we infer `list` first and run into `inner`.
Yet, the arguments for `inner` are not annotated, so since we haven't
inferred `inner` yet, we will create two unbound variables.
And naturally, we will link the type of `[]` to being of the same type
as `xs` -- which is still unbound at this point. The return type of
`inner` is given by the annotation, so all-in-all, the unification
will work without ever having to commit to a type of `[]`.
It is only later, when `inner` is inferred, that we will generalise
the unbound type of `xs` to a generic which the same as `b` in the
annotation. At this point, `[]` is also typed with this same generic,
which has a different id than `a` in `list` since it comes from
another type definition.
This is unfortunate and will cause issues down the line for the code
generation. The problem doesn't occur when `inner`'s arguments are
properly annotated or, when `inner` is actually inferred first.
Hence, I saw two possible avenues for fixing this problem:
1. Detect the presence of 'uncongruous generics' in definitions after
they've all been inferred, and raise a user error asking for more
annotations.
2. Infer definitions in dependency order, with definitions used in
other inferred first.
This commit does (2) (although it may still be a good idea to do (1)
eventually) since it offers a much better user experience. One way to
do (2) is to construct a dependency graph between function calls, and
ensure perform a topological sort.
Building such graph is, however, quite tricky as it requires walking
through the AST while maintaining scope etc. which is more-or-less
already what the inferrence step is doing; so it feels like double
work.
Thus instead, this commit tries to do a deep-first inferrence and
"pause" inferrence of definitions when encountering a call to fully
infer the callee first. To achieve this properly, we must ensure that
we do not infer the same definition again, so we "remember" already
inferred definitions in the environment now.
Until now, we would pretty-print unbound variable the same way we would pretty-print generics. This turned out to be very confusing when debugging, as they have a quite different semantic and it helps to visualize unbound types in definitions.
This was somehow wrong and corrected by codegen later on, but we should be re-using the same generic id across an entire definition if the variable refers to the same element.
This should not happen; if it does, it's an error from the type-checker. So instead of silently swallowing the error and adopting a behavior which is only _sometimes_ right, it is better to fail loudly and investigate.
Refactor get_uplc_type to account for constr types that don't exactly resolve to a uplc type
Check arg_stack in uplc generator has only 1 argument at the end of the generation
warning fixes
Temporarily using the 'specialize-dict-key' branch from the stdlib
which makes use of Pair where relevant. Once this is merged back into
'main' we should update the acceptance test toml files to keep getting
them automatically upgraded.
This commit also fixes an oversight in the reification of data-types
now properly distinguishing between pairs and 2-tuples.
Co-authored-by: Microproofs <kasey.white@cardanofoundation.org>
Before this commit, we would parse 'Pair' as a user-defined
data-types, and thus piggybacking on that whole record system. While
perhaps handy for some things, it's also semantically wrong and
induces a lot more complexity in codegen which now needs to
systematically distinguish every data-type access between pairs, and
others.
So it's better to have it as a separate expression, and handle it
similar to tuples (since it's fundamentally a 2-tuple with a special
serialization).
And a few more tests along the way for others. Note that it is important here that we try to parse for a 'Pair' BEFORE we try to parse for a constructor pattern. Because the latter would swallow any Pair pattern.
Currently, pattern-matching on 'Pair' is handled by treating Pair as a
record, which comes as slightly odd given that it isn't actually a
record and isn't user-defined. Thus now, every use of a record must
distinguish between Pairs and other kind of records -- which screams
for another variant constructor instead.
We cannot use `Tuple` either for this, because then we have no ways to
tell 2-tuples apart from pairs, which is the whole point here. So the
most sensical thing to do is to define a new pattern `Pair` which is
akin to tuples, but simpler since we know the number of elements and
it's always 2.
We have been a bit too strict on disallowing 'allow_cast' propagations. This is really only problematic for nested elements like Tuple's elements or App's args. However, for linked and unbound var it is probably okay, and it certainly is as well for function arguments.
it seems we can fix this by changing which side
gets subtracted by 1 depending on the op associativity.
BinOp::Or & BinOp::And are right associative while the
other bin ops are left associative.
closes#893
Co-authored-by: Kasey White <kwhitemsg@gmail.com>
This makes the search for counterexample slower in some cases by 30-40% with the hope of finding better counterexamples. We might want to add a flag '--simplification-level' to the command-line to let users decide on the level of simplifications.
And move some logic out of project/lib to be near the CheckedModule
instead. The project API is already quite heavy and long, so making it
more lightweight is generally what we want to tend to.
This changes ensure that we only compile modules from dependencies
that are used (or transitively used) in the project. This allows to
discard entire compilation steps at a module level, for modules that
we do not use.
The main goal of this change isn't performances. It's about making
dependencies management slightly easier in the time we decide whether
and how we want to manage transitive dependencies in Aiken.
A concrete case here is aiken-lang/stdlib, which will soon depend on
aiken-lang/fuzz. However, we do not want to require every single
project depending on stdlib to also require fuzz. So instead, we want
to seggregate fuzz API from stdlib in separate module, and only
compile those if they appear in the pruned dependency graph.
While the goal isn't performances, here are some benchmarks analyzing
the performances of deps pruning on a simple project depends on a few
modules from stdlib:
Benchmark 1: ./aiken-without-deps-pruning check scratchpad
Time (mean ± σ): 190.3 ms ± 101.1 ms [User: 584.5 ms, System: 14.2 ms]
Range (min … max): 153.0 ms … 477.7 ms 10 runs
Benchmark 2: ./aiken-with-deps-pruning check scratchpad
Time (mean ± σ): 162.3 ms ± 46.3 ms [User: 572.6 ms, System: 14.0 ms]
Range (min … max): 142.8 ms … 293.7 ms 10 runs
As we can see, this change seems to have an overall positive impact on
the compilation time.
Also slightly extended the check test 'framework' to allow registering side-dependency and using them from another module. This allows to check the interplay between opaque type from within and outside of their host module.
is_assignment was a bit confusing to me since we do actually categorize expect as 'assignment'. So this is more about whether this is a *let* assignment. Hence 'is_let'.
Discard pattern are _dangerous_ is used recklessly. The problem comes
from maintenance and when adding new fields. We usually don't get any
compiler warnings which may lead to missing spots and confusing
behaviors.
So I have, in some cases, inline discard to explicitly list all
fields. That's a bit more cumbersome to write but hopefully will catch
a few things for us in the future.
The main trick here was transforming Assignment
to contain `Vec<UntypedPattern, Option<Annotation>>`
in a field called patterns. This then meant that I
could remove the `pattern` and `annotation` field
from `Assignment`. The parser handles `=` and `<-`
just fine because in the future `=` with multi
patterns will mean some kind of optimization on tuples.
But, since we don't have that optimization yet, when
someone uses multi patterns with an `=` there will be an
error returned from the type checker right where `infer_seq`
looks for `backpassing`. From there the rest of the work
was in `Project::backpassing` where I only needed to rework
some things to work with a list of patterns instead of just one.
The 3rd kind of assignment kind (Bind) is gone and now reflected through a boolean parameter. Note that this parameter is completely erased by the type-checker so that the rest of the pipeline (i.e. code-generation) doesn't have to make any assumption. They simply can't see a backpassing let or expect.
This is more holistic and less awkward than having monadic bind working only with some pre-defined type. Backpassing work with _any_ function, and can be implemented relatively easily by rewriting the AST on-the-fly.
Also, it is far easier to explain than trying to explain what a monadic bind is, how its behavior differs from type to type and why it isn't generally available for any monadic type.
It might be slightly cleaner and more extensible to change to return a summary, potentially even making track the tests, coverage, etc. so it can be serialized to JSON. But, for now, this is much simpler, and the approach that KtorZ suggested.