Mapping over Results

Nom works with slices of bytes (&[u8]) so we need some way to convert these to strings and then parse them into integers.

Rust already provides a method for the first part: std::str::from_utf8.

It’s type signature looks like this:

&[u8] -> Result<&str, Utf8Error>

We need is convert &[u8] -> &str, what is up with that Result<> thingy around it?

The problem is that there are some byte sequences that are not valid as UTF-8 sequences.

If our string is just made up of bytes from 0 to 127 (ASCII), everything works fine.

fn main() {
    let input = [49, 50, 51]; // ASCII for "123"
    println!("{:?}", std::str::from_utf8(&input));
}
// => Ok("123")

255 is a valid byte value but must not appear in a sequence.

fn main() {
    let input = [49, 50, 51, 255];
    println!("{:?}", std::str::from_utf8(&input));
}
// => Err(Utf8Error { valid_up_to: 3, error_len: Some(1) })

There are a ton of other cases where parsing bytes to a string could go wrong, but the one above has to do for now…

Now that we know why there is a Result<> around the stuff we want, how do we use from_utf8 with nom? map! from Part 1 won’t work and using .unwrap() or .expect() would be very inelegant.

The solution is surprisingly simple, nom already includes a variation of map! that works with functions that return Results and nom::digit, a parser that recognizes one or more of the characters ‘0’…‘9’.

named!(
    integer<&str>,
    map_res!(nom::digit, std::str::from_utf8)
);

Parsing Integers

Of course &str is not what we really want, we still need to parse it to one of the integer types, for now just i64.

One way to do this is to use str.parse::<i64>()1 which returns a Result, too, so we need to use map_res! again.

named!(
    integer<i64>,
    map_res!(
      map_res!(nom::digit, std::str::from_utf8),
      |s: &str| s.parse::<i64>()
    )
);

Rust seems to have a hard time figuring out the type of s inside the closure (for good reasons, I am sure), so we need set it to &str by hand.

To try out our new parser, just change the parse() function from Part 1 to use it instead of syntactic_keyword.

fn parse(line: &str) {
    // let res = syntactic_keyword(line.as_bytes());
    let res = integer(line.as_bytes());
    println!("Parsed {:#?}", res);
}

Valid values for i64 range from to 2, so an easy way to see how map_res! handles errors would be to use or higher as input.

>> 1
Parsed Done([], 1)
>> 2
Parsed Done([], 2)
>> 3
Parsed Done([], 3)
>> 0004
Parsed Done([], 4)
>> 9223372036854775808
Parsed Error(MapRes)https://www.youtube.com/watch?v=je8UCmQ45h4

Funfact, this piece of code panics for the same reason:

fn main() {
    println!("{:?}",  std::i64::MIN);
    println!("{:?}", -std::i64::MIN);
}
// -9223372036854775808
// thread 'main' panicked at 'attempt to negate with overflow', ...

“Show me a Sign”

As a last step, we’ll add support for signed integers (like -42). To do that, we need some way to express “An optional ‘-‘ followed by one or more digits” as a parser.

opt!(parser) makes parser optional, so opt!(tag("-")) gives us the first part and we already know that nom::digit matches one or more digits, the only thing missing is some way to chain them together.

do_parse!(opt!(tag"-") >> digit >> ()) creates a parser that matches the desired pattern and returns () (no result).

The last piece of the puzzle is recognize!(parser) which returns the input if its child parser was successful.

Putting all of them together, get:

// Top of the file
use nom::{digit};

// ...

named!(
    integer<i64>,
    map_res!(
        map_res!(
            recognize!(
                do_parse!(
                    opt!(tag!("-")) >>
                    digit >>
                    ()
                )
            ),
            std::str::from_utf8
        ),
        |s: &str| s.parse::<i64>()
    )
);

nom has a problem recognizing module paths inside macros, so nom::digit won’t work inside the do_parse!.

Most of the macros take parsers as inputs and return parsers, so we can make our parser less messy by creating a special integer_literal parser.

named!(
    integer_literal,
    recognize!(
        do_parse!(
            opt!(tag!("-")) >>
            digit >>
            ()
        )
    )
);

named!(
    integer<i64>,
    map_res!(
        map_res!(
            integer_literal,
            std::str::from_utf8
        ),
        |s: &str| s.parse::<i64>()
    )
);
>> -123
Parsed Done([], -123)
>> -0
Parsed Done([], 0)
>> 0
Parsed Done([], 0)
>> 123
Parsed Done([], 123)
>>

This has to do for now, in the next part I’ll try to handle binary, octal and hex numbers.

Full source code: l3kn/r5rs-parser.

to, usually this is already set by the type of the variable in a assignment, e.g. let res: i64 = str.parse()

take a look at how two’s complement is defined.

  1. ::<i64> is an alternative way of defining which type we want to parse 

  2. If you are wondering why the range is assymetric,