I'm trying to iterate over a string, but iterating in slices of length n
instead of iterator over every character. The following code accomplishes this manually, but is there a more functional way to do this?
fn main() {
let string = "AAABBBCCC";
let offset = 3;
for (i, _) in string.chars().enumerate() {
if i % offset == 0 {
println!("{}", &string[i..(i+offset)]);
}
}
}
I would use a combination of Peekable
and Take
:
fn main() {
let string = "AAABBBCCC";
let mut z = string.chars().peekable();
while z.peek().is_some() {
let chunk: String = z.by_ref().take(3).collect();
println!("{}", chunk);
}
}
In other cases, Itertools::chunks
might do the trick:
extern crate itertools;
use itertools::Itertools;
fn main() {
let string = "AAABBBCCC";
for chunk in &string.chars().chunks(3) {
for c in chunk {
print!("{}", c);
}
println!();
}
}
Be aware of issues with bytes / characters / code points / graphemes whenever you start splitting strings. With anything more complicated than ASCII characters, one character is not one byte and string slicing operates on bytes! There is also the concept of Unicode code points, but multiple Unicode characters may combine to form what a human thinks of as a single character. This stuff is non-trivial.
If you actually just have ASCII data, it may be worth it to store it as such, perhaps in a Vec<u8>
. At the very least, I'd create a newtype that wraps a &str
and only exposes ASCII-safe method and validates that it is ASCII when created.
chunks()
is not available for &str
because it is not really well-defined on strings - do you want chunks with length in bytes, or characters, or grapheme clusters? If you know in advance that your string is in ASCII you can use the following code:
use std::str;
fn main() {
let string = "AAABBBCCC";
for chunk in str_chunks(string, 3) {
println!("{}", chunk);
}
}
fn str_chunks<'a>(s: &'a str, n: usize) -> Box<Iterator<Item=&'a str>+'a> {
Box::new(s.as_bytes().chunks(n).map(|c| str::from_utf8(c).unwrap()))
}
However, it will break immediately if your strings have non-ASCII characters inside them. I'm pretty sure that it is possible to implement an iterator which splits a string into chunks of code points or grapheme clusters - it is just there is no such thing in the standard library now.
You can always implement your own iterator. Of course that still requires quite some code, but it's not at the location where you are working with the string. Therefor your loop stays readable.
#![feature(collections)]
struct StringChunks<'a> {
s: &'a str,
step: usize,
n: usize,
}
impl<'a> StringChunks<'a> {
fn new(s: &'a str, step: usize) -> StringChunks<'a> {
StringChunks {
s: s,
step: step,
n: s.chars().count(),
}
}
}
impl<'a> Iterator for StringChunks<'a> {
type Item = &'a str;
fn next(&mut self) -> Option<&'a str> {
if self.step > self.n {
return None;
}
let ret = self.s.slice_chars(0, self.step);
self.s = self.s.slice_chars(self.step, self.n);
self.n -= self.step;
Some(ret)
}
}
fn main() {
let string = "AAABBBCCC";
for s in StringChunks::new(string, 3) {
println!("{}", s);
}
}
Note that this splits after n
unicode chars. So graphemes or similar might end up split up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With