Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string in Rust, treating consecutive delimiters as one

Tags:

rust

How do I split a string in Rust such that contiguous delimiters are collapsed into one? For example:

"1  2 3".splitX(" ")

should yield this Vec: ["1", "2", "3"] (when collected from the Split object, or any other intermediate object there may be). This example is for whitespace but we should be able to extend this for other delimiters too.

I believe we can use .filter() to remove empty items after using .split(), but it would be cleaner if it could be done as part of the original .split() directly. I obviously searched this thoroughly and am surprised I can't find the answer anywhere.

I know for whitespace we already have split_whitespace() and split_ascii_whitespace(), but I am looking for a solution that works for a general delimiter string.

like image 955
Gaurang Tandon Avatar asked May 11 '26 13:05

Gaurang Tandon


2 Answers

The standard solution is to use split then filter:

let output: Vec<&str> = input
    .split(pattern)
    .filter(|s| !s.is_empty())
    .collect();

This is fast and clear.

You can also use a regular expression to avoid the filter step:

let output: Vec<&str> = regex::Regex::new(" +").unwrap()
    .split(input)
    .collect();

If it's in a function which will be called several times, you can avoid repeating the Regex compilation with lazy_regex:

let output: Vec<&str> = lazy_regex::regex!(" +")
    .split(input)
    .collect();
like image 73
Denys Séguret Avatar answered May 14 '26 12:05

Denys Séguret


IMO, by far the cleanest way is to write .split(" ").filter(|s| !s.is_empty()). It works for all separators and the intent is obvious from reading the code.

If that's too "ugly", you could perhaps pull it into a trait:

trait SplitNonEmpty {
  // you might want to define your own struct for the return type
  fn split_non_empty<'a, P>(&self, p: P) where P: Pattern<'a> -> ...;
}

impl SplitNonEmpty for &str {
  // ...
}

If it's very important that this function returns a Split, you might need to refactor your code to use traits more; do you really care that it was created by splitting a string, or do you care that you can iterate over it? If so, maybe that function should take a impl IntoIterator<&'a str>?

like image 27
cameron1024 Avatar answered May 14 '26 12:05

cameron1024



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!