Editor's note: This code example is from a version of Rust prior to 1.0 when many iterators implemented
Copy
. Updated versions of this code produce a different errors, but the answers still contain valuable information.
I'm trying to write a function to split a string into clumps of letters and numbers; for example, "test123test"
would turn into [ "test", "123", "test" ]
. Here's my attempt so far:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek() {
None => return bits,
Some(c) => if c.is_digit() {
bits.push(iter.take_while(|c| c.is_digit()).collect());
} else {
bits.push(iter.take_while(|c| !c.is_digit()).collect());
}
}
}
return bits;
}
However, this doesn't work, looping forever. It seems that it is using a clone of iter
each time I call take_while
, starting from the same position over and over again. I would like it to use the same iter
each time, advancing the same iterator over all the each_time
s. Is this possible?
As you identified, each take_while
call is duplicating iter
, since take_while
takes self
and the Peekable
chars iterator is Copy
. (Only true before Rust 1.0 — editor)
You want to be modifying the iterator each time, that is, for take_while
to be operating on an &mut
to your iterator. Which is exactly what the .by_ref
adaptor is for:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek().map(|c| *c) {
None => return bits,
Some(c) => if c.is_digit(10) {
bits.push(iter.by_ref().take_while(|c| c.is_digit(10)).collect());
} else {
bits.push(iter.by_ref().take_while(|c| !c.is_digit(10)).collect());
},
}
}
}
fn main() {
println!("{:?}", split("123abc456def"))
}
Prints
["123", "bc", "56", "ef"]
However, I imagine this is not correct.
I would actually recommend writing this as a normal for
loop, using the char_indices
iterator:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(input[start..i].to_string());
is_digit = this_is_digit;
start = i;
}
}
bits.push(input[start..].to_string());
bits
}
This form also allows for doing this with much fewer allocations (that is, the String
s are not required), because each returned value is just a slice into the input
, and we can use lifetimes to state this:
pub fn split<'a>(input: &'a str) -> Vec<&'a str> {
let mut bits = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(&input[start..i]);
is_digit = this_is_digit;
start = i;
}
}
bits.push(&input[start..]);
bits
}
All that changed was the type signature, removing the Vec<String>
type hint and the .to_string
calls.
One could even write an iterator like this, to avoid having to allocate the Vec
. Something like fn split<'a>(input: &'a str) -> Splits<'a> { /* construct a Splits */ }
where Splits
is a struct that implements Iterator<&'a str>
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With