Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I split a string using a Rust regex and keep the delimiters?

Tags:

regex

rust

I have a string that is separated by a delimiter. I want to split this string using regex and keep the delimiters.

My current code is:

use regex::Regex; // 1.1.8

fn main() {
    let seperator = Regex::new(r"([ ,.]+)").expect("Invalid regex");
    let splits: Vec<_> = seperator.split("this... is a, test").into_iter().collect();
    for split in splits {
        println!("\"{}\"", split);
    }
}

The output of which is:

"this"
"is"
"a"
"test"

I would like to keep the separators (in this case the space characters), the output I would like to see is:

"this"
"... "
"is"
" "
"a"
", "
"test"

How can I, if at all possible, achieve such behavior with regex?

This is different from Split a string keeping the separators, which uses the standard library and not the regex crate.

like image 667
Ian Rehwinkel Avatar asked Jul 07 '19 11:07

Ian Rehwinkel


People also ask

Can we use regex in split a string?

split(String regex) method splits this string around matches of the given regular expression. This method works in the same way as invoking the method i.e split(String regex, int limit) with the given expression and a limit argument of zero. Therefore, trailing empty strings are not included in the resulting array.

How do you split a string by the occurrences of a regex pattern?

split() method split the string by the occurrences of the regex pattern, returning a list containing the resulting substrings.

What is a delimiter regex?

Delimiters. The first element of a regular expression is the delimiters. These are the boundaries of your regular expressions. The most common delimiter that you'll see with regular expressions is the slash ( / ) or forward slash.


1 Answers

As documented on the Regex type:

Using the std::str::pattern methods with Regex

Note: This section requires that this crate is compiled with the pattern Cargo feature enabled, which requires nightly Rust.

Since Regex implements Pattern, you can use regexes with methods defined on &str. For example, is_match, find, find_iter and split can be replaced with str::contains, str::find, str::match_indices and str::split.

Using the pattern feature, you can use the techniques described in Split a string keeping the separators:

use regex::Regex; // 1.1.8

fn split_keep<'a>(r: &Regex, text: &'a str) -> Vec<&'a str> {
    let mut result = Vec::new();
    let mut last = 0;
    for (index, matched) in text.match_indices(r) {
        if last != index {
            result.push(&text[last..index]);
        }
        result.push(matched);
        last = index + matched.len();
    }
    if last < text.len() {
        result.push(&text[last..]);
    }
    result
}

fn main() {
    let seperator = Regex::new(r"([ ,.]+)").expect("Invalid regex");
    let splits = split_keep(&seperator, "this... is a, test");
    for split in splits {
        println!("\"{}\"", split);
    }
}

This also gives you a hint on how to transform the code to not require nightly Rust:

For example, [...] find_iter [...] can be replaced with [...] str::match_indices

Apply the reverse transformation to use stable Regex methods.

like image 143
Shepmaster Avatar answered Oct 24 '22 06:10

Shepmaster