The R5RS spec for strings is pretty simple, but in addition to that, support for \n, \r and \t would be nice.

<string> → " <string element>* "
<string element> → <any character other than " or \ > | \" | \\

nom seems to have two options to handle escaped strings:

Let’s use the later one, because the example code already does 80% of what we want.

fn to_s(i:Vec<u8>) -> String {
  String::from_utf8_lossy(&i).into_owned()
}

named!(
    string_content<String>,
    map!(
        escaped_transform!(
            take_until_either!("\"\\"),
            '\\',
            alt!(
                tag!("\\") => { |_| &b"\\"[..] } |
                tag!("\"") => { |_| &b"\""[..] } |
                tag!("n") => { |_| &b"\n"[..] } |
                tag!("r") => { |_| &b"\r"[..] } |
                tag!("t") => { |_| &b"\t"[..] }
            )
        ),
        to_s
    )
);

The only changes are to use take_until_either!("\"\\") to matche any characters until either a \ or a " appears instead of alpha and add support for \r and \t.

Based on this parser for stuff inside the ", next we need a way to make sure there are "s around our strings. delimited! is similar to the earlier preceded! and does just that, it takes three parsers

  • opening delimiter
  • body
  • closing delimiter

and returns only the result for the body.

named!(string<String>,
    delimited!(tag!("\""), string_content, tag!("\""))
);

Now we only need to add a string type to the Token enum, the string parser to the token parser and everything should work fine.

#[derive(Debug, PartialEq)]
enum Token {
    Keyword(SyntacticKeyword),
    Number(i64),
    Boolean(bool),
    Character(char),
    String(String),
}

named!(
    token<Token>,
    alt!(
        syntactic_keyword => { |kw| Token::Keyword(kw) } |
        integer           => { |i| Token::Number(i) } |
        boolean           => { |b| Token::Boolean(b) } |
        character         => { |c| Token::Character(c) } |
        string            => { |s| Token::String(s) }
    )
);
>> "seems to work"
Parsed Done([], String("seems to work"))
>> "test123 \n\t\r\"\"\\"
Parsed Done([], String("test123 \n\t\r\"\"\\"))
>>

Full source code: l3kn/r5rs-parser.