Booleans

In R5RS there are two boolean literals, #t for true and #f for false.

With our newly aquired nom skills, this should be easy:

// We can't name this parser bool because that is a registered keyword in rust
named!(
    boolean<bool>,
    alt!(
        tag!("#t") => { |_| true } |
        tag!("#f") => { |_| false }
    )
);

Chars

Chars are not that complex either, there are just three cases to handle:

  • #\space as alias for ' '
  • #\newline as alias for '\n'
  • #\<any char>

All of these cases begin with #\, so preceded!(tag!("#\\"), ...) would be a good start.

// Top of the file, 
// all the digits are needed for the number parsers from earlier parts
use nom::{digit, oct_digit, hex_digit, anychar};

// ...

named!(
    character<char>,
    preceded!(
        tag!("#\\"),
        alt_complete!(
            tag!("space") => { |_| ' ') } |
            tag!("newline" => { |_| '\n' } |
            anychar
        )
    )
);

For the first two cases, we just match on the names with tag! and use the => syntax to return the right char. The third case is surprisingly easy as well because nom::anychar does exactly what we want: to match any character and return a char.

Again we need to use alt_complete! instead of alt! and put anychar at the end of the chain, otherwise #\space would get parsed as #\s or #\s as an Incomplete #\space.

Combining Types

At the end we want to have a parser that can handle all kinds of scheme values and returns some wrapper type.

For now, there are just four cases to handle:

  1. Keywords, from Part 1
  2. Numbers, from Part 2 & 3
  3. Booleans
  4. Characters
#[derive(Debug)]
enum Token {
    Keyword(SyntacticKeyword),
    Number(i64),
    Boolean(bool),
    Character(char),
}

In addition to that new wrapper type, we need a new parser that combines the parsers for all types and wraps the results in the corresponding Token type.

named!(
    token<Token>,
    alt!(
        syntactic_keyword => { |kw| Token::Keyword(kw) } |
        integer => { |i| Token::Number(i) } |
        boolean => { |b| Token::Boolean(b) } |
        character => { |c| Token::Character(c) }
    )
);
fn parse(line: &str) {
    let res = token(line.as_bytes());
    println!("Parsed {:#?}", res);
}

Testing

An assertion might look like this:

assert_eq!(boolean("#t".as_bytes()), nom::IResult::Done(&b""[..], true));
assert_eq!(boolean("#f".as_bytes()), nom::IResult::Done(&b""[..], false));

There is a lot of boilerplate code because the input has to be a &[u8], not &str and we expect our input to be parsed fully, so the first part of Done is an empty &[u8] (which we get by &b""[..]).

A nice fix is to write a macro that takes the parts we care about (parser, input string, output value) and fills in the rest:

macro_rules! assert_parsed_fully {
    ($parser:expr, $input:expr, $result:expr) => {
        assert_eq!(
            $parser($input.as_bytes()),
            nom::IResult::Done(&b""[..], $result)
        );
    } 
}

Now we can write tests in a much cleaner way:

#[test] // This marks functions as unit tests, they can be run with `cargo test`
fn test_bool() {
    assert_parsed_fully!(boolean, "#t", true);
    assert_parsed_fully!(boolean, "#f", false);
}

#[test]
fn test_character() {
    assert_parsed_fully!(character, "#\\space", ' ');
    assert_parsed_fully!(character, "#\\newline", '\n');
    assert_parsed_fully!(character, "#\\ ", ' ');
    assert_parsed_fully!(character, "#\\X", 'X');
}

#[test]
fn test_integer() {
    assert_parsed_fully!(integer, "1", 1);
    assert_parsed_fully!(integer, "#d+1", 1);
    assert_parsed_fully!(integer, "-1", -1);
    assert_parsed_fully!(integer, "#b010101", 21);
    assert_parsed_fully!(integer, "#o77", 63);
    assert_parsed_fully!(integer, "#xFF", 255);
    assert_parsed_fully!(integer, "#x-ff", -255);
}

In order to use assert_eq! on Tokens, we need to define a way to test if two of them are equal, formalized in the PartialEq trait.

We won’t use Eq here, because in the future there might be some tokens (e.g. NaN) where the equivalence relation is not reflexive (v != v for some token v).

Just like the Display trait, we can make rust derive PartialEq automatically by adding it in the #[derive(...)] above Token, SyntacticKeyword and ExpressionKeyword.

#[derive(Debug, PartialEq)]
enum Token {
  // ..
}

Now assert_parsed_fully! works for tokens, too.

#[test]
fn test_token() {
    assert_parsed_fully!(token, "1", Token::Number(1));
    assert_parsed_fully!(token, "else", Token::Keyword(SyntacticKeyword::Else));
    assert_parsed_fully!(token, "lambda", Token::Keyword(
        SyntacticKeyword::Expression(ExpressionKeyword::Lambda))
    );
    assert_parsed_fully!(token, "#\\space", Token::Character(' '));
    // ...
}

I’ll leave coming up with more test cases as an exercise for the reader. If you find a case that does not work as expected, feel free to open up an issue.

Full source code: l3kn/r5rs-parser.