Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split string into units of each character

Tags:

rust

I want to take a string where there are characters that may be repeated and split the string into units of each character.

So for example

aaaabbbabbbaaaacccbbbbbbbbaaa

would become

[ aaaa, bbb, a, bbb, aaaa, ccc, bbbbbbbb, aaa ]
like image 392
wolfenstien98 Avatar asked Oct 24 '25 23:10

wolfenstien98


2 Answers

A succinct way is to use Itertools::group_by on an iterator of chars:

extern crate itertools;

use itertools::Itertools;

fn main() {
    let input = "aaaabbbabbbaaaacccbbbbbbbbaaa";

    let output: Vec<String> = input
        .chars()
        .group_by(|&x| x)
        .into_iter()
        .map(|(_, r)| r.collect())
        .collect();

    assert_eq!(
        output,
        ["aaaa", "bbb", "a", "bbb", "aaaa", "ccc", "bbbbbbbb", "aaa"]
    );
}

However, this requires creating new Strings for each group of characters. A more efficient solution would return slices to the original string.

A (hacky) modification to the previous solution yields these:

let mut start = input;
let output: Vec<&str> = input
    .chars()
    .group_by(|&x| x)
    .into_iter()
    .map(|(_, r)| {
        let len: usize = r.map(|c| c.len_utf8()).sum();
        let (a, b) = start.split_at(len);
        start = b;
        a
    })
    .collect();
like image 183
Shepmaster Avatar answered Oct 28 '25 01:10

Shepmaster


If you think that an external tool is overkill, you can do this like that:

fn group_chars(mut input: &str) -> Vec<&str> {
    fn first_different(mut chars: std::str::Chars) -> Option<usize> {
        chars.next().map(|f| chars.take_while(|&c| c == f).fold(f.len_utf8(), |len, c| len + c.len_utf8()))
    }

    let mut output = Vec::new();

    while let Some(different) = first_different(input.chars()) {
        let (before, after) = input.split_at(different);
        input = after;
        output.push(before);
    }

    output
}

fn main() {
    assert_eq!(
        group_chars("aaaabbbébbbaaaacccbbbbbbbbaaa"),
        ["aaaa", "bbb", "é", "bbb", "aaaa", "ccc", "bbbbbbbb", "aaa"]
    );
}

Or you can do an iterator:

pub struct CharGroups<'a> {
    input: &'a str,
}

impl<'a> CharGroups<'a> {
    pub fn new(input: &'a str) -> CharGroups<'a> {
        CharGroups { input }
    }
}

impl<'a> Iterator for CharGroups<'a> {
    type Item = &'a str;

    fn next(&mut self) -> Option<&'a str> {
        self.input.chars().next().map(|f| {
            let i = self.input.find(|c| c != f).unwrap_or(self.input.len());
            let (before, after) = self.input.split_at(i);
            self.input = after;
            before
        })
    }
}

fn main() {
    assert_eq!(
        CharGroups::new("aaaabbbébbbaaaacccbbbbbbbbaaa").collect::<Vec<_>>(),
        ["aaaa", "bbb", "é", "bbb", "aaaa", "ccc", "bbbbbbbb", "aaa"]
    );
}
like image 26
Boiethios Avatar answered Oct 28 '25 02:10

Boiethios



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!