Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterate over a string, n elements at a time

I'm trying to iterate over a string, but iterating in slices of length n instead of iterator over every character. The following code accomplishes this manually, but is there a more functional way to do this?

fn main() {
    let string = "AAABBBCCC";
    let offset = 3;
    for (i, _) in string.chars().enumerate() {
        if i % offset == 0 {
            println!("{}", &string[i..(i+offset)]);
        }
    }
}
like image 690
anderspitman Avatar asked Apr 16 '15 09:04

anderspitman


3 Answers

I would use a combination of Peekable and Take:

fn main() {
    let string = "AAABBBCCC";
    let mut z = string.chars().peekable();
    while z.peek().is_some() {
        let chunk: String = z.by_ref().take(3).collect();
        println!("{}", chunk);
    }
}

In other cases, Itertools::chunks might do the trick:

extern crate itertools;

use itertools::Itertools;

fn main() {
    let string = "AAABBBCCC";
    for chunk in &string.chars().chunks(3) {
        for c in chunk {
            print!("{}", c);
        }
        println!();
    }
}

Standard warning about splitting strings

Be aware of issues with bytes / characters / code points / graphemes whenever you start splitting strings. With anything more complicated than ASCII characters, one character is not one byte and string slicing operates on bytes! There is also the concept of Unicode code points, but multiple Unicode characters may combine to form what a human thinks of as a single character. This stuff is non-trivial.

If you actually just have ASCII data, it may be worth it to store it as such, perhaps in a Vec<u8>. At the very least, I'd create a newtype that wraps a &str and only exposes ASCII-safe method and validates that it is ASCII when created.

like image 112
Shepmaster Avatar answered Nov 12 '22 09:11

Shepmaster


chunks() is not available for &str because it is not really well-defined on strings - do you want chunks with length in bytes, or characters, or grapheme clusters? If you know in advance that your string is in ASCII you can use the following code:

use std::str;

fn main() {
    let string = "AAABBBCCC";
    for chunk in str_chunks(string, 3) {
        println!("{}", chunk);
    }
}

fn str_chunks<'a>(s: &'a str, n: usize) -> Box<Iterator<Item=&'a str>+'a> {
    Box::new(s.as_bytes().chunks(n).map(|c| str::from_utf8(c).unwrap()))
}

However, it will break immediately if your strings have non-ASCII characters inside them. I'm pretty sure that it is possible to implement an iterator which splits a string into chunks of code points or grapheme clusters - it is just there is no such thing in the standard library now.

like image 22
Vladimir Matveev Avatar answered Nov 12 '22 09:11

Vladimir Matveev


You can always implement your own iterator. Of course that still requires quite some code, but it's not at the location where you are working with the string. Therefor your loop stays readable.

#![feature(collections)]

struct StringChunks<'a> {
    s: &'a str,
    step: usize,
    n: usize,
}

impl<'a> StringChunks<'a> {
    fn new(s: &'a str, step: usize) -> StringChunks<'a> {
        StringChunks {
            s: s,
            step: step,
            n: s.chars().count(),
        }
    }
}

impl<'a> Iterator for StringChunks<'a> {
    type Item = &'a str;
    fn next(&mut self) -> Option<&'a str> {
        if self.step > self.n {
            return None;
        }
        let ret = self.s.slice_chars(0, self.step);
        self.s = self.s.slice_chars(self.step, self.n);
        self.n -= self.step;
        Some(ret)
    }
}

fn main() {
    let string = "AAABBBCCC";
    for s in StringChunks::new(string, 3) {
        println!("{}", s);
    }
}

Note that this splits after n unicode chars. So graphemes or similar might end up split up.

like image 27
oli_obk Avatar answered Nov 12 '22 10:11

oli_obk