I have a string that is separated by a delimiter. I want to split this string using regex and keep the delimiters.
My current code is:
use regex::Regex; // 1.1.8
fn main() {
let seperator = Regex::new(r"([ ,.]+)").expect("Invalid regex");
let splits: Vec<_> = seperator.split("this... is a, test").into_iter().collect();
for split in splits {
println!("\"{}\"", split);
}
}
The output of which is:
"this"
"is"
"a"
"test"
I would like to keep the separators (in this case the space characters), the output I would like to see is:
"this"
"... "
"is"
" "
"a"
", "
"test"
How can I, if at all possible, achieve such behavior with regex?
This is different from Split a string keeping the separators, which uses the standard library and not the regex crate.
split(String regex) method splits this string around matches of the given regular expression. This method works in the same way as invoking the method i.e split(String regex, int limit) with the given expression and a limit argument of zero. Therefore, trailing empty strings are not included in the resulting array.
split() method split the string by the occurrences of the regex pattern, returning a list containing the resulting substrings.
Delimiters. The first element of a regular expression is the delimiters. These are the boundaries of your regular expressions. The most common delimiter that you'll see with regular expressions is the slash ( / ) or forward slash.
As documented on the Regex
type:
Using the
std::str::pattern
methods withRegex
Note: This section requires that this crate is compiled with the
pattern
Cargo feature enabled, which requires nightly Rust.Since
Regex
implementsPattern
, you can use regexes with methods defined on&str
. For example,is_match
,find
,find_iter
andsplit
can be replaced withstr::contains
,str::find
,str::match_indices
andstr::split
.
Using the pattern
feature, you can use the techniques described in Split a string keeping the separators:
use regex::Regex; // 1.1.8
fn split_keep<'a>(r: &Regex, text: &'a str) -> Vec<&'a str> {
let mut result = Vec::new();
let mut last = 0;
for (index, matched) in text.match_indices(r) {
if last != index {
result.push(&text[last..index]);
}
result.push(matched);
last = index + matched.len();
}
if last < text.len() {
result.push(&text[last..]);
}
result
}
fn main() {
let seperator = Regex::new(r"([ ,.]+)").expect("Invalid regex");
let splits = split_keep(&seperator, "this... is a, test");
for split in splits {
println!("\"{}\"", split);
}
}
This also gives you a hint on how to transform the code to not require nightly Rust:
For example, [...]
find_iter
[...] can be replaced with [...]str::match_indices
Apply the reverse transformation to use stable Regex
methods.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With