Use double-quotes for utf-8 bytearrays, and @"..." for string literals

The core observation is that **in the context of Aiken** (i.e. on-chain logic)
  people do not generally want to use String. Instead, they want
  bytearrays.

  So, it should be easy to produce bytearrays when needed and it should
  be the default. Before this commit, `"foo"` would parse as a `String`.
  Now, it parses as a `ByteArray`, whose bytes are the UTF-8 bytes
  encoding of "foo".

  Now, to make this change really "fool-proof", we now want to:

  - [ ] Emit a parse error if we parse a UTF-8 bytearray literal in
    place where we would expect a `String`. For example, `trace`,
    `error` and `todo` can only be followed by a `String`.

    So when we see something like:

    ```
    trace "foo"
    ```

    we know it's a mistake and we can suggest users to use:

    ```
    trace @"foo"
    ```

    instead.

  - [ ] Emit a warning if we ever see a bytearray literals UTF-8, which
    is either 56 or 64 character long and is a valid hexadecimal string.
    For example:

    ```
    let policy_id = "29d222ce763455e3d7a09a665ce554f00ac89d2e99a1a83d267170c6"
    ```

    This is _most certainly_ a mistake, as this generates a ByteArray of
    56 bytes, which is effectively the hex-encoding of the provided string.

    In this scenario, we want to warn the user and inform them they probably meant to use:

    ```
    let policy_id = #"29d222ce763455e3d7a09a665ce554f00ac89d2e99a1a83d267170c6"
    ```
This commit is contained in:
KtorZ
2023-02-17 19:15:13 +01:00
parent 98b89f32e1
commit 53fb821b62
7 changed files with 138 additions and 109 deletions

View File

@@ -77,13 +77,21 @@ pub fn lexer() -> impl Parser<char, Vec<(Token, Span)>, Error = ParseError> {
.or(just('t').to('\t')),
);
let string = just('"')
let string = just('@')
.ignore_then(just('"'))
.ignore_then(filter(|c| *c != '\\' && *c != '"').or(escape).repeated())
.then_ignore(just('"'))
.collect::<String>()
.map(|value| Token::String { value })
.labelled("string");
let bytestring = just('"')
.ignore_then(filter(|c| *c != '\\' && *c != '"').or(escape).repeated())
.then_ignore(just('"'))
.collect::<String>()
.map(|value| Token::ByteString { value })
.labelled("bytestring");
let keyword = text::ident().map(|s: String| match s.as_str() {
"trace" => Token::Trace,
"error" => Token::ErrorTerm,
@@ -158,16 +166,18 @@ pub fn lexer() -> impl Parser<char, Vec<(Token, Span)>, Error = ParseError> {
comment_parser(Token::ModuleComment),
comment_parser(Token::DocComment),
comment_parser(Token::Comment),
choice((ordinal, keyword, int, op, newlines, grouping, string))
.or(any().map(Token::Error).validate(|t, span, emit| {
emit(ParseError::expected_input_found(
span,
None,
Some(t.clone()),
));
t
}))
.map_with_span(|token, span| (token, span)),
choice((
ordinal, keyword, int, op, newlines, grouping, bytestring, string,
))
.or(any().map(Token::Error).validate(|t, span, emit| {
emit(ParseError::expected_input_found(
span,
None,
Some(t.clone()),
));
t
}))
.map_with_span(|token, span| (token, span)),
))
.padded_by(one_of(" \t").ignored().repeated())
.recover_with(skip_then_retry_until([]))

View File

@@ -8,6 +8,7 @@ pub enum Token {
UpName { name: String },
DiscardName { name: String },
Int { value: String },
ByteString { value: String },
String { value: String },
// Groupings
NewLineLeftParen, // ↳(
@@ -97,6 +98,7 @@ impl fmt::Display for Token {
Token::DiscardName { name } => name,
Token::Int { value } => value,
Token::String { value } => value,
Token::ByteString { value } => value,
Token::NewLineLeftParen => "↳(",
Token::LeftParen => "(",
Token::RightParen => ")",