This is the best we can do for this without
rearchitecting when we rewrite backpassing to
plain ol' assignments. In this case, if we see
a var and there is no annotation (thus probably not a cast),
then it's safe to rewrite to a `let` instead of an `expect`.
This way, we don't get a warning that is **unfixable**.
We are not trying to solve every little warning edge
case with this fix. We simply just can't allow there
to be a warning that the user can't make go away through
some means. All other edge cases like pattern matching on
a single contructor type with expect warnings can be fixed
via other means.
This is crucial as some checks regarding variable usages depends on
warnings; so we may accidentally remove variables from the AST as a
consequence of backtracking for deep inferrence.
The current inferrence system walks expressions from "top to bottom".
Starting from definitions higher in the source file, and down. When a
call is encountered, we use the information known for the callee
definition we have at the moment it is inferred.
This causes interesting issues in the case where the callee doesn't
have annotations and in only partially known. For example:
```
pub fn list(fuzzer: Option<a>) -> Option<List<a>> {
inner(fuzzer, [])
}
fn inner(fuzzer, xs) -> Option<List<b>> {
when fuzzer is {
None -> Some(xs)
Some(x) -> Some([x, ..xs])
}
}
```
In this small program, we infer `list` first and run into `inner`.
Yet, the arguments for `inner` are not annotated, so since we haven't
inferred `inner` yet, we will create two unbound variables.
And naturally, we will link the type of `[]` to being of the same type
as `xs` -- which is still unbound at this point. The return type of
`inner` is given by the annotation, so all-in-all, the unification
will work without ever having to commit to a type of `[]`.
It is only later, when `inner` is inferred, that we will generalise
the unbound type of `xs` to a generic which the same as `b` in the
annotation. At this point, `[]` is also typed with this same generic,
which has a different id than `a` in `list` since it comes from
another type definition.
This is unfortunate and will cause issues down the line for the code
generation. The problem doesn't occur when `inner`'s arguments are
properly annotated or, when `inner` is actually inferred first.
Hence, I saw two possible avenues for fixing this problem:
1. Detect the presence of 'uncongruous generics' in definitions after
they've all been inferred, and raise a user error asking for more
annotations.
2. Infer definitions in dependency order, with definitions used in
other inferred first.
This commit does (2) (although it may still be a good idea to do (1)
eventually) since it offers a much better user experience. One way to
do (2) is to construct a dependency graph between function calls, and
ensure perform a topological sort.
Building such graph is, however, quite tricky as it requires walking
through the AST while maintaining scope etc. which is more-or-less
already what the inferrence step is doing; so it feels like double
work.
Thus instead, this commit tries to do a deep-first inferrence and
"pause" inferrence of definitions when encountering a call to fully
infer the callee first. To achieve this properly, we must ensure that
we do not infer the same definition again, so we "remember" already
inferred definitions in the environment now.
Until now, we would pretty-print unbound variable the same way we would pretty-print generics. This turned out to be very confusing when debugging, as they have a quite different semantic and it helps to visualize unbound types in definitions.
This was somehow wrong and corrected by codegen later on, but we should be re-using the same generic id across an entire definition if the variable refers to the same element.
This should not happen; if it does, it's an error from the type-checker. So instead of silently swallowing the error and adopting a behavior which is only _sometimes_ right, it is better to fail loudly and investigate.
Refactor get_uplc_type to account for constr types that don't exactly resolve to a uplc type
Check arg_stack in uplc generator has only 1 argument at the end of the generation
warning fixes
Temporarily using the 'specialize-dict-key' branch from the stdlib
which makes use of Pair where relevant. Once this is merged back into
'main' we should update the acceptance test toml files to keep getting
them automatically upgraded.
This commit also fixes an oversight in the reification of data-types
now properly distinguishing between pairs and 2-tuples.
Co-authored-by: Microproofs <kasey.white@cardanofoundation.org>
Before this commit, we would parse 'Pair' as a user-defined
data-types, and thus piggybacking on that whole record system. While
perhaps handy for some things, it's also semantically wrong and
induces a lot more complexity in codegen which now needs to
systematically distinguish every data-type access between pairs, and
others.
So it's better to have it as a separate expression, and handle it
similar to tuples (since it's fundamentally a 2-tuple with a special
serialization).
And a few more tests along the way for others. Note that it is important here that we try to parse for a 'Pair' BEFORE we try to parse for a constructor pattern. Because the latter would swallow any Pair pattern.
Currently, pattern-matching on 'Pair' is handled by treating Pair as a
record, which comes as slightly odd given that it isn't actually a
record and isn't user-defined. Thus now, every use of a record must
distinguish between Pairs and other kind of records -- which screams
for another variant constructor instead.
We cannot use `Tuple` either for this, because then we have no ways to
tell 2-tuples apart from pairs, which is the whole point here. So the
most sensical thing to do is to define a new pattern `Pair` which is
akin to tuples, but simpler since we know the number of elements and
it's always 2.
We have been a bit too strict on disallowing 'allow_cast' propagations. This is really only problematic for nested elements like Tuple's elements or App's args. However, for linked and unbound var it is probably okay, and it certainly is as well for function arguments.
it seems we can fix this by changing which side
gets subtracted by 1 depending on the op associativity.
BinOp::Or & BinOp::And are right associative while the
other bin ops are left associative.
closes#893
Co-authored-by: Kasey White <kwhitemsg@gmail.com>
This makes the search for counterexample slower in some cases by 30-40% with the hope of finding better counterexamples. We might want to add a flag '--simplification-level' to the command-line to let users decide on the level of simplifications.
And move some logic out of project/lib to be near the CheckedModule
instead. The project API is already quite heavy and long, so making it
more lightweight is generally what we want to tend to.
This changes ensure that we only compile modules from dependencies
that are used (or transitively used) in the project. This allows to
discard entire compilation steps at a module level, for modules that
we do not use.
The main goal of this change isn't performances. It's about making
dependencies management slightly easier in the time we decide whether
and how we want to manage transitive dependencies in Aiken.
A concrete case here is aiken-lang/stdlib, which will soon depend on
aiken-lang/fuzz. However, we do not want to require every single
project depending on stdlib to also require fuzz. So instead, we want
to seggregate fuzz API from stdlib in separate module, and only
compile those if they appear in the pruned dependency graph.
While the goal isn't performances, here are some benchmarks analyzing
the performances of deps pruning on a simple project depends on a few
modules from stdlib:
Benchmark 1: ./aiken-without-deps-pruning check scratchpad
Time (mean ± σ): 190.3 ms ± 101.1 ms [User: 584.5 ms, System: 14.2 ms]
Range (min … max): 153.0 ms … 477.7 ms 10 runs
Benchmark 2: ./aiken-with-deps-pruning check scratchpad
Time (mean ± σ): 162.3 ms ± 46.3 ms [User: 572.6 ms, System: 14.0 ms]
Range (min … max): 142.8 ms … 293.7 ms 10 runs
As we can see, this change seems to have an overall positive impact on
the compilation time.
Also slightly extended the check test 'framework' to allow registering side-dependency and using them from another module. This allows to check the interplay between opaque type from within and outside of their host module.
is_assignment was a bit confusing to me since we do actually categorize expect as 'assignment'. So this is more about whether this is a *let* assignment. Hence 'is_let'.
Discard pattern are _dangerous_ is used recklessly. The problem comes
from maintenance and when adding new fields. We usually don't get any
compiler warnings which may lead to missing spots and confusing
behaviors.
So I have, in some cases, inline discard to explicitly list all
fields. That's a bit more cumbersome to write but hopefully will catch
a few things for us in the future.
The main trick here was transforming Assignment
to contain `Vec<UntypedPattern, Option<Annotation>>`
in a field called patterns. This then meant that I
could remove the `pattern` and `annotation` field
from `Assignment`. The parser handles `=` and `<-`
just fine because in the future `=` with multi
patterns will mean some kind of optimization on tuples.
But, since we don't have that optimization yet, when
someone uses multi patterns with an `=` there will be an
error returned from the type checker right where `infer_seq`
looks for `backpassing`. From there the rest of the work
was in `Project::backpassing` where I only needed to rework
some things to work with a list of patterns instead of just one.
The 3rd kind of assignment kind (Bind) is gone and now reflected through a boolean parameter. Note that this parameter is completely erased by the type-checker so that the rest of the pipeline (i.e. code-generation) doesn't have to make any assumption. They simply can't see a backpassing let or expect.
This is more holistic and less awkward than having monadic bind working only with some pre-defined type. Backpassing work with _any_ function, and can be implemented relatively easily by rewriting the AST on-the-fly.
Also, it is far easier to explain than trying to explain what a monadic bind is, how its behavior differs from type to type and why it isn't generally available for any monadic type.
It might be slightly cleaner and more extensible to change to return a summary, potentially even making track the tests, coverage, etc. so it can be serialized to JSON. But, for now, this is much simpler, and the approach that KtorZ suggested.
We'll piggyback on the tracing capabilities of the VM to provide labelling for prop tests. To ensure we do not interfere with normal traces, we only count traces that starts with a NUL byte as label. That convention is assumed to be known of the companion fuzz library that should then provide the labelling capabilities as a dedicated function.
I've been benchmarking that through the shrink of 'large' lists, and the cache brings about 1.5x speed increase. For small and simple cases, the cache as no visible effects (positive or negative).
Before this commit, we would always show the 'declared form' of type aliases, with their generic, non-instantiated parameters. This now tries to unify the annotation with the underlying inferred type to provide even better alias pretty printing.
Somehow, these have always been right-associative, when the natural thing to expect is left-associativity. It now matters when trying to crawl down binary tree to display them properly.
Using ByteArrays as vectors on-chain is a lot more efficient than relying on actul Data's list of values. From the Rust end, it doesn't change much as we were already manipulating vectors anyway.
Also, this commit makes `apply_term` automatically re-intern the
program since it isn't safe to apply any term onto a UPLC program. In
particular, terms that introduce new let-bindings (via lambdas) will
mess with the already generated DeBruijn indices.
The problem doesn't occur for pure constant terms like Data. So we
still have a safe and fast version 'apply_data' when needed.
This was a mess to say to the least. The mess started when we wanted
to make all definitions in codegen use immutable maps of references --
which was and still is a good idea. Yet, the population of the data
types and functions definitions was done somehow in a separate step,
in a rather ad-hoc manner.
This commit changes that to ensure the project's data_types and
functions are populated while type checking the AST such that we need
not to redo it after.
The code for registering the data type definitions and function
definitions was also duplicated in at least 3 places. It is now a
method of the TypedModule.
Note: this change isn't only just cosmetic, it's also necessary for
the commit that follows which aims at adding tests to the set of
available function definitions, thus allowing to make property tests
callable.
Those end-to-end tests are useful. Both for controlling the behavior of the shrinker, but also to double check the reification of Plutus Data back into untyped expressions.
I had to work-around a few things to get opaque type and private types play nice. Also found a weird bug due to how we apply parameters after unique debruijn indexes have been also applied. A work-around is to re-intern the program.
True corresponds to Constr=1 and False corresponds to Constr=0; their position in the vector shall reflect that. Note that while this would in principle impact codegen for any other type, it doesn't for bool since we likely never looked up this type definition since it is well-known. It does now as the 'reify' function relies on this. Whoopsie.
This is very very rough at the moment. But it does a couple of thing:
1. The 'ArgVia' now contains an Expr/TypedExpr which should unify to a Fuzzer. This is to avoid having to introduce custom logic to handle fuzzer referencing. So this now accepts function call, field access etc.. so long as they unify to the right thing.
2. I've done quite a lot of cleanup in aiken-project mostly around the tests and the naming surrounding them. What we used to call 'Script' is now called 'Test' and is an enum between UnitTest (ex-Script) and PropertyTest. I've moved some boilerplate and relevant function under those module Impl.
3. I've completed the end-to-end pipeline of:
- Compiling the property test
- Compiling the fuzzer
- Generating an initial seed
- Running property tests sequentially, threading the seed through each step.
An interesting finding is that, I had to wrap the prop test in a similar wrapper that we use for validator, to ensure we convert primitive types wrapped in Data back to UPLC terms. This is necessary because the fuzzer return a ProtoPair (and soon an Array) which holds 'Data'.
At the moment, we do nothing with the size, though the size should ideally grow after each iteration (up to a certain cap).
In addition, there are a couple of todo/fixme that I left in the code as reminders of what's left to do beyond the obvious (error and success reporting, testing, etc..)
The parameter is special as it takes no annotation but a 'via' keyword followed by an expression that should unify to a Fuzzer<a>, where Fuzzer<a> = fn(Seed) -> (Seed, a). The current commit only allow name identifiers for now. Ultimately, this may allow full expressions.
We cannot enforce internal invariants on opaque types from only structural checks on Data. Thus, it is forbidden to find an opaque type in an outward-facing interface. Instead, users should rely on intermediate representations and lift them into opaque types using constructors and methods provided by the type (e.g. Dict.from_list, Rational.from_int, Rational.new, ...)
We've been wrongly representing large ints as BigInt, causing them to
behave differently in the VM through builtins like 'serialise_data'.
Indeed, we expect anything that fits in 8 bytes to be encoded as Major
Type 0 or 1. But we were switching to encoding as Major type 6
(tagged, PosBigInt, NegBigInt) for much smaller values! Anything
outside of the range [-2^32, 2^32-1] would be treated as big int
(positive or negative).
Why? Because we checked whether a value i would fit in an i64, and if
it didn't we treated it as big int. But the reality is more subtle...
Fortunately, Rust has i128 and the minicbor library implements TryFrom
which enforces that the value fits in a range of [-2^64, 2^64 - 1], so
we're back on track easily.
While looking at some code, I noticed that this
warning would show up even if an error for a
non-exhaustive when/is shows up for the same when/is
expression. This isn't a useful situation to show this
warning because things are not exhaustive yet so we should
let the user finish and only provide the errors. If things
are exhaustive then the code proceeds and if a warning was set
when there's only one clause pattern then this warning message
can be pushed because that's when it's actually useful.
This commit allows Data to be optionally annotated with a
phantom-type. This doesn't change anything in codegen but we can now
leverage this information to generate better blueprint schemas.
Note that the formatter rewrite parens-block sequences as curly-block
sequences anyway. Albeit weird looking syntax, they are valid
nonetheless.
I also clarified a bit the hints and description of the
'illegal::return' error as it would mistakenly say 'function' instead
of 'block'.
- do not erase sequences if the sole expression is an assignment
- emit parse error if an assignment is assigned to an assignment
- do not allow assignments in logical op chains
This reverts commit 21f0b3a6220fdafb8f6aad6855de89d8cdde0e1b.
Rationale:
The absence of clause guard was here done *on purpose*. Indeed,
introducing a clause guard here forces either duplication or the use
of a wildcard which is not "future proof".
Should we make a change to that one day (e.g. add a new variant to
TraceLevel), we won't get any compiler warning and we'll very likely
forget to update that particular section of the code.
So as much as possible, enforce complete pattern-match on variants
make for code that is easier to maintain in the long-run.
This allows for a more fine-grained control over how the traces are showed. Now users can instrument the compiler to preserve only their user-defined traces, or the only the compiler, or all, or none. We also want to add another trace level on top of that: 'compact' to only show line numbers; which will work for both user-defined and/or compiler-generated traces.
We rely on some errors to just bubble up and get printed.
By matching on result at the top level like this we blocked some
error messages from being able to be printed. For me this showed up
when `cargo run -- new thing/thing` printed nothing even when there
was an existing `thing` folder. It has already been the pattern for
sometime for some subcommands to handle calling process::exit(1) in
situations where it needs to handle error reporting more specially. It
may seem lame, hacky, or repetitive but it's easy to maintain and read.
This is a *slight* hack / abuse of the code() method as we are now
doing a bit of formatting within that function. Yet, we only do so at
the very top-level (i.e. project's Error) because we can't actually
fiddle with how miette presents errors.
Also removed the 'clear' flag to do it by default instead of clogging
the terminal view.
This now works pretty nicely, and the logic is back under
`aiken_project`.
Rather than have this logic in the aiken binary, this provides a generic
mechanism to do "something" on file change events. KtorZ is going to
handle wiring it up to the CLI in the best way for the project.
I tried to write some tests for this, but it's hard to isolate the
watcher logic without wrestling with the borrow checker, or overly
neutering this utility.
This adds the following command
```
aiken watch
```
There are some open questions to answer, though:
- I really like the ergonomics of `aiken watch`; but it also makes sense
as a flag to `aiken check` or `aiken build` etc.; should we just
support the flag, the command, or both?
- Right now I duplicated the with_project method, because it forces
process::exit(1); Should we refactor this, and if so, how?
- Are there other configuration options we want?
to pass 2 of the conformance tests, we need to make sure
that we aren't typechecking builtin arguments as arguments
are applied. This switches push to by removing the call to check_type
and then reworking all the associated unwrap methods on Value
so that they return the same errors that were being returned before.
feat: impl flat serialization and deserialization for bls constants
feat: started on cost models for the new builtins
Co-authored-by: rvcas <x@rvcas.dev>
- sort alphabetically
- add some of the missing builtins used for ints
- comment on what is "correct" for future additions
- comment on the current remaining missing builtins
- comment on the current incoherent method names
This was somewhat weirdly done, with a boolean 'imported' set on the
formers; but an explicit new warning for values. I don't see the point
of distinguishing them so I just merged them all into a single
warning.
I have however preserved the 'UnusedType' and 'UnusedConstructor'
warnings since they were ALSO used for unused private constructors or
types.
I initially removed the 'UnkownTypeConstructor' since it wasn't used anywhere and was in fact dead-code. On second thoughts however, it is nicer to provide a slightly better error message when a constructor is missing as well as some valid suggestion. Prior to that commit, we would simply return a 'UnknownVariable' and the hint might suggest lowercase identifiers; which is wrong.
I previously missed a case and it causes qualified imports to be added at the end if they are lexicographically smaller than ALL other qualified imports. No big deal, but this is now fixed.
We're going to have more quickfixes, to it's best not to overload the
'server' module. Plus, there's a lot of boilerplate around the
quickfixes so we might want to factor it out.
It's a bit 'off-topic' to keep these in aiken-lang as those functions are really just about lsp. Plus, it removes a bit some of the boilerplate and make the entire edition more readable and re-usable. Now we can tackle other similar errors with the same quickfix.
This removes the need to rely on the formatter to clear things up
after insert a new import. While this is not so useful for imports, I
wanted to experiment with the approach for future similar edits (for
example, when suggesting an inline rewrite).
- Add support to the formatter for these doc comments
- Add a new field to `Arg` `doc: Option<String>`
- Don't attach docs immediately after typechecking a module
- instead we should do it on demand in docs, build, and lsp
- the check command doesn't need to have any docs attached
- doing it more lazily defers the computation until later making
typechecking feedback a bit faster
- Add support for function arg and validator param docs in
`attach_module_docs` methods
- Update some snapshots
- Add put_doc to Arg
closes#685