Spec

In addition to decimal integers like those we handled in Part 2, R5RS includes literals for binary, octal and hexadecimal numbers.

  • #b11 (binary)
  • #o17 (octal)
  • #d19 (decimal)
  • #xaf (hexadecimal)

They are made up of a radix specifier (#b, #o, #d, #x, none), a sign (+, -, none) and a non-empty sequence of digits with given radix.

If there is no radix specifier, the default radix is 10 and if there is no sign, the integer is positive (obviously).

Digit Sequences

In the last part, we used nom::digit to match sequences of decimal digits.

There are two other variants of this, nom::oct_digit and nom::hex_digit.

Sadly there is no bin_digit so we need to write it ourselves.

Looking through the list of nom macros, one might assume something like many1!(one_of!("01")) would be a good to do so, but many1!(...) returns a list of results instead of just a matching sequence of bytes.

take_while! sounds more like what we want and has a variant that only matches sequences that are non-empty:

[…] returns the longest list of bytes for which the function is true. […]

The signature of take_while! looks like this:

take_while!(T -> bool) => &[T] -> IResult<&[T], &[T]>

We are working with byte slices, so T is u8, so what we need is a function that takes a u8 byte and returns true iff it is a binary digit.

fn is_bin_digit(char: u8) -> bool {
  // Just '0' would be a char,
  // putting b in front marks it as a byte
  char == b'0' || char == b'1'
}

Now we can build our own bin_digit parser:

named!(bin_digit, take_while1!(is_bin_digit));

More Signs

In addition to -, + can be used as a sign, too, so we need a way to handle this.

To keep the integers parsers as dry as possible, we’ll extract this into its own parser:

named!(sign, recognize!(opt!(one_of!("+-"))));

The only new thing here is one_of!(str). According to the docs, it …

… matches one of the provided characters. one_of!("abc") could recognize ‘a’, ‘b’, or ‘c’.

Just using opt!(one_of!("+-")) would lead to problems once we use it inside of do_parse!(sign >> digit >> ()), because it’s return type (Option<...>) is different, so we have to wrap recognize! around it to get a sequence of bytes instead.

Parsing Numbers with Radix

Next we need some way to parse these digit sequences. str::parse::<i64>() won’t do this time, because there is no way to tell it which radix (2, 8, 10 or 16) to use.

Instead, we can use i64::from_str_radix(src: &str, radix: u32) which returns a Result, too, so we can just swap the two functions inside the map_res! from Part 2.

Doing this for all new variants (and for decimal integers, to keep things consistent) we can build new parsers integer_literal2, integer_literal8, …, that match sequences signed binary, octal, decimal and hexadecimal numbers.

// Top of the file
use nom::{digit, oct_digit, hex_digit};

// ...

named!(
    integer_literal2,
    recognize!(do_parse!(sign >> bin_digit >> ()))
);

named!(
    integer_literal8,
    recognize!(do_parse!(sign >> oct_digit >> ()))
);

named!(
    integer_literal10,
    recognize!(do_parse!(sign >> digit >> ()))
);

named!(
    integer_literal16,
    recognize!(do_parse!(sign >> hex_digit >> ()))
);

And based on that, some new parsers that return i64s…

named!(
    integer2<i64>,
    map_res!(
        map_res!(integer_literal2, std::str::from_utf8),
        |s| i64::from_str_radix(s, 2)
    )
);

named!(
    integer8<i64>,
    map_res!(
        map_res!(integer_literal8, std::str::from_utf8),
        |s| i64::from_str_radix(s, 8)
    )
);

named!(
    integer10<i64>,
    map_res!(
        map_res!(integer_literal10, std::str::from_utf8),
        |s| i64::from_str_radix(s, 10)
    )
);

named!(
    integer16<i64>,
    map_res!(
        map_res!(integer_literal16, std::str::from_utf8),
        |s| i64::from_str_radix(s, 16)
    )
);

Finally, we need to combine all the parsers above into one parser that can handle all kinds of integers, choosing one of the subparsers depending on the numbers radix specifier.

nom provides an elegant way to do this, preceded! takes two parsers, tries to apply the first one and then returns the result of second one.

Remember that #d is optional, so we have to use opt! there.

named!(
    integer<i64>,
    alt!(
        preceded!(tag!("#b"), integer2) |
        preceded!(tag!("#o"), integer8) |
        preceded!(opt!(tag!("#d")), integer10) |
        preceded!(tag!("#x"), integer16)
    )
);

Now fire up the REPL to check if everything works as expected:

>> 123
Parsed Done([], 123)
>> +123
Parsed Done([], 123)
>> #x+FF
Parsed Done([], 255)
>> #x+Ff
Parsed Done([], 255)
>> #b101010
Parsed Done([], 42)
>> #oFF
Parsed Error(Alt)

There is a lot of code duplication going on above but I don’t want to get into macros just now, so let’s call it a day.

Full source code: l3kn/r5rs-parser.