Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using the same iterator multiple times in Rust

Tags:

rust

Editor's note: This code example is from a version of Rust prior to 1.0 when many iterators implemented Copy. Updated versions of this code produce a different errors, but the answers still contain valuable information.

I'm trying to write a function to split a string into clumps of letters and numbers; for example, "test123test" would turn into [ "test", "123", "test" ]. Here's my attempt so far:

pub fn split(input: &str) -> Vec<String> {
    let mut bits: Vec<String> = vec![];
    let mut iter = input.chars().peekable();
    loop {
        match iter.peek() {
            None => return bits,
            Some(c) => if c.is_digit() {
                bits.push(iter.take_while(|c| c.is_digit()).collect());
            } else {
                bits.push(iter.take_while(|c| !c.is_digit()).collect());
            }
        }
    }
    return bits;
}

However, this doesn't work, looping forever. It seems that it is using a clone of iter each time I call take_while, starting from the same position over and over again. I would like it to use the same iter each time, advancing the same iterator over all the each_times. Is this possible?

like image 507
Ben S Avatar asked May 31 '14 10:05

Ben S


1 Answers

As you identified, each take_while call is duplicating iter, since take_while takes self and the Peekable chars iterator is Copy. (Only true before Rust 1.0 — editor)

You want to be modifying the iterator each time, that is, for take_while to be operating on an &mut to your iterator. Which is exactly what the .by_ref adaptor is for:

pub fn split(input: &str) -> Vec<String> {
    let mut bits: Vec<String> = vec![];
    let mut iter = input.chars().peekable();
    loop {
        match iter.peek().map(|c| *c) {
            None => return bits,
            Some(c) => if c.is_digit(10) {
                bits.push(iter.by_ref().take_while(|c| c.is_digit(10)).collect());
            } else {
                bits.push(iter.by_ref().take_while(|c| !c.is_digit(10)).collect());
            },
        }
    }
}

fn main() {
    println!("{:?}", split("123abc456def"))
}

Prints

["123", "bc", "56", "ef"]

However, I imagine this is not correct.

I would actually recommend writing this as a normal for loop, using the char_indices iterator:

pub fn split(input: &str) -> Vec<String> {
    let mut bits: Vec<String> = vec![];
    if input.is_empty() {
        return bits;
    }

    let mut is_digit = input.chars().next().unwrap().is_digit(10);
    let mut start = 0;

    for (i, c) in input.char_indices() {
        let this_is_digit = c.is_digit(10);
        if is_digit != this_is_digit {
            bits.push(input[start..i].to_string());
            is_digit = this_is_digit;
            start = i;
        }
    }

    bits.push(input[start..].to_string());
    bits
}

This form also allows for doing this with much fewer allocations (that is, the Strings are not required), because each returned value is just a slice into the input, and we can use lifetimes to state this:

pub fn split<'a>(input: &'a str) -> Vec<&'a str> {
    let mut bits = vec![];
    if input.is_empty() {
        return bits;
    }

    let mut is_digit = input.chars().next().unwrap().is_digit(10);
    let mut start = 0;

    for (i, c) in input.char_indices() {
        let this_is_digit = c.is_digit(10);
        if is_digit != this_is_digit {
            bits.push(&input[start..i]);
            is_digit = this_is_digit;
            start = i;
        }
    }

    bits.push(&input[start..]);
    bits
}

All that changed was the type signature, removing the Vec<String> type hint and the .to_string calls.

One could even write an iterator like this, to avoid having to allocate the Vec. Something like fn split<'a>(input: &'a str) -> Splits<'a> { /* construct a Splits */ } where Splits is a struct that implements Iterator<&'a str>.

like image 114
huon Avatar answered Oct 19 '22 19:10

huon