Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does my nom parser not consume the entire input, leaving the last piece unparsed?

Tags:

rust

nom

I'm trying to split a log line on space and commas in order to create a Vector of Tokens of Field and Separator as shown in the code below.

My problem is that nom doesn't seem to consume the entire log line, it leaves the last part unparsed - in this case 08:33:58).

main.rs

#![feature(rust_2018_preview)]

#[macro_use] extern crate nom;

#[derive(Debug, PartialEq)]
pub enum Token<'a> {
    Separator(&'a [u8]),
    Field(&'a [u8]),    
}

named!(separator, is_a!(" ,"));

named!(not_sep, is_not!(" ,"));

named!(
    token<Token>,
    alt_complete!(
        separator => { |s| Token::Separator(s) } |
        not_sep =>   { |n| Token::Field(n) }
    )
);

named!(sequence<Vec<Token>>, many1!(token));


pub fn scan(input: &[u8]) -> Vec<Token> {
    let (_, seq) = sequence(input).unwrap();

    seq
}

fn main() {
}

#[cfg(test)]
mod tests {
    use std::str;
    use crate::Token;
    use crate::scan;

    #[test]
    fn parse_stuff() {

        let log = &b"docker INFO 2019-10-01 08:33:58,878 [1] schedule:run Running job Every 1 hour do _precache_systems_streaks() (last run: 2018-09-21 07:33:58, next run: 2018-09-21 08:33:58)";

        let seq = scan(&log[..]);

        for t in seq {
            let text = match t {
                Token::Field(data) => format!("f[{}]", str::from_utf8(data).unwrap()),
                Token::Separator(data) => format!("s[{}]", str::from_utf8(data).unwrap()),
            };

            println!("{}", text);
        }
    }
}

Cargo.toml

[dependencies]
nom = "4.0"

output

f[docker]
s[ ]
f[INFO]
s[ ]
f[2019-10-01]
s[ ]
f[08:33:58]
s[,]
f[878]
s[ ]
f[[1]]
s[ ]
f[schedule:run]
s[ ]
f[Running]
s[ ]
f[job]
s[ ]
f[Every]
s[ ]
f[1]
s[ ]
f[hour]
s[ ]
f[do]
s[ ]
f[_precache_systems_streaks()]
s[ ]
f[(last]
s[ ]
f[run:]
s[ ]
f[2018-09-21]
s[ ]
f[07:33:58]
s[, ]
f[next]
s[ ]
f[run:]
s[ ]
f[2018-09-21]
s[ ]
like image 904
mchlstckl Avatar asked Oct 09 '18 11:10

mchlstckl


1 Answers

The issue you're running into is that Nom is designed to always assume that there may be more input, unless you tell it otherwise. Since you know your input here is complete, you need to feed the parsers your literal wrapped in a CompleteByteSlice (or if you used a &str, a CompleteStr). These types are thin wrappers that Nom uses to indicate that we know there isn't more input coming. It will make it so a parser that fails to make a complete match returns an Error instead of an Incomplete, and in this case, will instruct the the parser to consume that final token, rather than ask for more characters.

like image 137
Zarenor Avatar answered Oct 10 '22 12:10

Zarenor