Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rust regex pattern - unrecognized escape pattern

Tags:

regex

rust

I do have following string:

\"lengthSeconds\":\"2664\"

which I would like to match with this regexp:

Regex::new("lengthSeconds\\\":\\\"(\\d+)\\\"")

I even tried this:

Regex::new(r#"lengthSeconds\":\"(\d+)\""#)

but I'm getting this:

regex parse error:
lengthSeconds\":\"(\d+)\"
             ^^
error: unrecognized escape sequence

What's wrong with the regexp pattern?

like image 954
n1_ Avatar asked Jan 06 '19 11:01

n1_


People also ask

How does Clippy detect invalid regex in rust?

But Clippy uses the parser of rust-lang/regex to detect invalid regex. This means the regex parser fails to parse what seems to be valid regex.

Why use lazy_staticcrate instead of regular expressions in rust?

In Rust, it can sometimes be a pain to pass regular expressions around if they're used from inside a helper function. Instead, we recommend using the lazy_staticcrate to ensure that regular expressions are compiled exactly once. For example:

How do I handle untrusted regular expressions and search text?

This crate can handle both untrusted regular expressions and untrusted search text. Untrusted regular expressions are handled by capping the size of a compiled regular expression. (See RegexBuilder::size_limit.) Without this, it would be trivial for an attacker to exhaust your system's memory with expressions like a{100}{100}{100}.

What are raw strings in rust?

This example also demonstrates the utility of raw strings in Rust, which are just like regular strings except they are prefixed with an r and do not process any escape sequences. For example, "\d" is the same expression as r"\d". It is an anti-pattern to compile the same regular expression in a loop since compilation is typically expensive.


1 Answers

By using r#..#, you treat your string as a raw string and hence do not process any escapes. However, since backslashes are special characters in Regex, the Regex expression itself still requires you to escape backslashes. So this

Regex::new(r#"\\"lengthSeconds\\":\\"(\d+)\\""#)

is what you want.

Alternatively, you could write

Regex::new("\\\\\"lengthSeconds\\\\\":\\\\\"(\\d+)\\\\\"").unwrap();

to yield the same result.

See this example on Rust Playground

like image 163
Stefan Mesken Avatar answered Oct 10 '22 23:10

Stefan Mesken