Iterate over a string, n elements at a time

Question

I'm trying to iterate over a string, but iterating in slices of length n instead of iterator over every character. The following code accomplishes this manually, but is there a more functional way to do this?

fn main() {
    let string = "AAABBBCCC";
    let offset = 3;
    for (i, _) in string.chars().enumerate() {
        if i % offset == 0 {
            println!("{}", &string[i..(i+offset)]);
        }
    }
}

Shepmaster · Accepted Answer

I would use a combination of Peekable and Take:

fn main() {
    let string = "AAABBBCCC";
    let mut z = string.chars().peekable();
    while z.peek().is_some() {
        let chunk: String = z.by_ref().take(3).collect();
        println!("{}", chunk);
    }
}

In other cases, Itertools::chunks might do the trick:

extern crate itertools;

use itertools::Itertools;

fn main() {
    let string = "AAABBBCCC";
    for chunk in &string.chars().chunks(3) {
        for c in chunk {
            print!("{}", c);
        }
        println!();
    }
}

Standard warning about splitting strings

Be aware of issues with bytes / characters / code points / graphemes whenever you start splitting strings. With anything more complicated than ASCII characters, one character is not one byte and string slicing operates on bytes! There is also the concept of Unicode code points, but multiple Unicode characters may combine to form what a human thinks of as a single character. This stuff is non-trivial.

If you actually just have ASCII data, it may be worth it to store it as such, perhaps in a Vec<u8>. At the very least, I'd create a newtype that wraps a &str and only exposes ASCII-safe method and validates that it is ASCII when created.

Vladimir Matveev · Answer

chunks() is not available for &str because it is not really well-defined on strings - do you want chunks with length in bytes, or characters, or grapheme clusters? If you know in advance that your string is in ASCII you can use the following code:

use std::str;

fn main() {
    let string = "AAABBBCCC";
    for chunk in str_chunks(string, 3) {
        println!("{}", chunk);
    }
}

fn str_chunks<'a>(s: &'a str, n: usize) -> Box<Iterator<Item=&'a str>+'a> {
    Box::new(s.as_bytes().chunks(n).map(|c| str::from_utf8(c).unwrap()))
}

However, it will break immediately if your strings have non-ASCII characters inside them. I'm pretty sure that it is possible to implement an iterator which splits a string into chunks of code points or grapheme clusters - it is just there is no such thing in the standard library now.

oli_obk · Answer

You can always implement your own iterator. Of course that still requires quite some code, but it's not at the location where you are working with the string. Therefor your loop stays readable.

#![feature(collections)]

struct StringChunks<'a> {
    s: &'a str,
    step: usize,
    n: usize,
}

impl<'a> StringChunks<'a> {
    fn new(s: &'a str, step: usize) -> StringChunks<'a> {
        StringChunks {
            s: s,
            step: step,
            n: s.chars().count(),
        }
    }
}

impl<'a> Iterator for StringChunks<'a> {
    type Item = &'a str;
    fn next(&mut self) -> Option<&'a str> {
        if self.step > self.n {
            return None;
        }
        let ret = self.s.slice_chars(0, self.step);
        self.s = self.s.slice_chars(self.step, self.n);
        self.n -= self.step;
        Some(ret)
    }
}

fn main() {
    let string = "AAABBBCCC";
    for s in StringChunks::new(string, 3) {
        println!("{}", s);
    }
}

Note that this splits after n unicode chars. So graphemes or similar might end up split up.

Iterate over a string, n elements at a time

Tags:

iterator

string

rust

anderspitman

3 Answers

Standard warning about splitting strings

Shepmaster

Vladimir Matveev

oli_obk

Recent Activity

Donate For Us

Iterate over a string, n elements at a time

Tags:

iterator

string

rust

anderspitman

3 Answers

Standard warning about splitting strings

Shepmaster

Vladimir Matveev

oli_obk

Related questions

Recent Activity

Donate For Us