Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the most sensible way to emulate lookbehind behavior in Rust regex?

The Rust regex crate states:

This crate provides a native implementation of regular expressions that is heavily based on RE2 both in syntax and in implementation. Notably, backreferences and arbitrary lookahead/lookbehind assertions are not provided.

As of this writing, "rust regex lookbehind" comes back with no results from DuckDuckGo.

I've never had to work around this before, but I can think of two approaches:

Approach 1 (forward)

  1. Iterate over .captures() for the pattern I want to use as lookbehind.
  2. Match the thing I actually wanted to match between captures. (forward)

Approach 2 (reverse)

  1. Match the pattern I really want to match.
  2. For each match, look for the lookbehind pattern until the end byte of a previous capture or the beginning of the string.

Not only does this seem like a huge pain, it also seems like a lot of edge cases are going to trip me up. Is there a better way to go about this?

Example

Given a string like:

"Fish33-Tiger2Hyena4-"

I want to extract ["33-", "2", "4-"] iff each one follows a string like "Fish".

like image 824
bright-star Avatar asked Jun 22 '16 16:06

bright-star


People also ask

What is Lookbehind in regex?

Introduction to the JavaScript regex lookbehind In regular expressions, a lookbehind matches an element if there is another specific element before it. A lookbehind has the following syntax: (?<=Y)X. In this syntax, the pattern match X if there is Y before it.

Can I use Lookbehind regex?

The good news is that you can use lookbehind anywhere in the regex, not only at the start. If you want to find a word not ending with an “s”, you could use \b\w+(? <! s)\b.

Does rust have regex?

A Rust library for parsing, compiling, and executing regular expressions. Its syntax is similar to Perl-style regular expressions, but lacks a few features like look around and backreferences. In exchange, all searches execute in linear time with respect to the size of the regular expression and search text.

What is Lookbehind assertion?

Regex Lookbehind is used as an assertion in Python regular expressions(re) to determine success or failure whether the pattern is behind i.e to the right of the parser's current position. They don't match anything. Hence, Regex Lookbehind and lookahead are termed as a zero-width assertion.


1 Answers

Without a motivating example, it's hard to usefully answer your question in a general way. In many cases, you can substitute lookaround operators with two regexes---one to search for candidates and another to produce the actual match you're interested in. However, this approach isn't always feasible.

If you're truly stuck, then you're only option is to use a regex library that supports these features. Rust has bindings to a couple of them:

  • PCRE
  • PCRE2
  • Oniguruma

There is also a more experimental library, fancy-regex, which is built on top of the regex crate.

like image 72
BurntSushi5 Avatar answered Nov 15 '22 19:11

BurntSushi5