Update, 2018-12-11

With version 4 of nom a lot of things changed so the code snippets won’t work anymore and some descriptions might be outdated.

Strings

The R5RS spec for strings is pretty simple, but in addition to that, support for \n, \r and \t would be nice.

<string> → " <string element>* "
<string element> → <any character other than " or \ > | \" | \\

nom seems to have two options to handle escaped strings:

Let’s use the later one, because the example code already does 80% of what we want.

fn to_s(i:Vec<u8>) -> String {
  String::from_utf8_lossy(&i).into_owned()
}

named!(
    string_content<String>,
    map!(
        escaped_transform!(
            take_until_either!("\"\\"),
            '\\',
            alt!(
                tag!("\\") => { |_| &b"\\"[..] } |
                tag!("\"") => { |_| &b"\""[..] } |
                tag!("n") => { |_| &b"\n"[..] } |
                tag!("r") => { |_| &b"\r"[..] } |
                tag!("t") => { |_| &b"\t"[..] }
            )
        ),
        to_s
    )
);

The only changes are to use take_until_either!("\"\\") to matche any characters until either a \ or a " appears instead of alpha and add support for \r and \t.

Based on this parser for stuff inside the ", next we need a way to make sure there are "s around our strings. delimited! is similar to the earlier preceded! and does just that, it takes three parsers

  • opening delimiter
  • body
  • closing delimiter

and returns only the result for the body.

named!(string<String>,
    delimited!(tag!("\""), string_content, tag!("\""))
);

Now we only need to add a string type to the Token enum, the string parser to the token parser and everything should work fine.

#[derive(Debug, PartialEq)]
enum Token {
    Keyword(SyntacticKeyword),
    Number(i64),
    Boolean(bool),
    Character(char),
    String(String),
}

named!(
    token<Token>,
    alt!(
        syntactic_keyword => { |kw| Token::Keyword(kw) } |
        integer           => { |i| Token::Number(i) } |
        boolean           => { |b| Token::Boolean(b) } |
        character         => { |c| Token::Character(c) } |
        string            => { |s| Token::String(s) }
    )
);
>> "seems to work"
Parsed Done([], String("seems to work"))
>> "test123 \n\t\r\"\"\\"
Parsed Done([], String("test123 \n\t\r\"\"\\"))
>>

Full source code: l3kn/r5rs-parser.