I want to take a string where there are characters that may be repeated and split the string into units of each character.
So for example
aaaabbbabbbaaaacccbbbbbbbbaaa
would become
[ aaaa, bbb, a, bbb, aaaa, ccc, bbbbbbbb, aaa ]
A succinct way is to use Itertools::group_by on an iterator of chars:
extern crate itertools;
use itertools::Itertools;
fn main() {
let input = "aaaabbbabbbaaaacccbbbbbbbbaaa";
let output: Vec<String> = input
.chars()
.group_by(|&x| x)
.into_iter()
.map(|(_, r)| r.collect())
.collect();
assert_eq!(
output,
["aaaa", "bbb", "a", "bbb", "aaaa", "ccc", "bbbbbbbb", "aaa"]
);
}
However, this requires creating new Strings for each group of characters. A more efficient solution would return slices to the original string.
A (hacky) modification to the previous solution yields these:
let mut start = input;
let output: Vec<&str> = input
.chars()
.group_by(|&x| x)
.into_iter()
.map(|(_, r)| {
let len: usize = r.map(|c| c.len_utf8()).sum();
let (a, b) = start.split_at(len);
start = b;
a
})
.collect();
If you think that an external tool is overkill, you can do this like that:
fn group_chars(mut input: &str) -> Vec<&str> {
fn first_different(mut chars: std::str::Chars) -> Option<usize> {
chars.next().map(|f| chars.take_while(|&c| c == f).fold(f.len_utf8(), |len, c| len + c.len_utf8()))
}
let mut output = Vec::new();
while let Some(different) = first_different(input.chars()) {
let (before, after) = input.split_at(different);
input = after;
output.push(before);
}
output
}
fn main() {
assert_eq!(
group_chars("aaaabbbébbbaaaacccbbbbbbbbaaa"),
["aaaa", "bbb", "é", "bbb", "aaaa", "ccc", "bbbbbbbb", "aaa"]
);
}
Or you can do an iterator:
pub struct CharGroups<'a> {
input: &'a str,
}
impl<'a> CharGroups<'a> {
pub fn new(input: &'a str) -> CharGroups<'a> {
CharGroups { input }
}
}
impl<'a> Iterator for CharGroups<'a> {
type Item = &'a str;
fn next(&mut self) -> Option<&'a str> {
self.input.chars().next().map(|f| {
let i = self.input.find(|c| c != f).unwrap_or(self.input.len());
let (before, after) = self.input.split_at(i);
self.input = after;
before
})
}
}
fn main() {
assert_eq!(
CharGroups::new("aaaabbbébbbaaaacccbbbbbbbbaaa").collect::<Vec<_>>(),
["aaaa", "bbb", "é", "bbb", "aaaa", "ccc", "bbbbbbbb", "aaa"]
);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With