I am trying to iterate over characters in stdin. The Read.chars()
method achieves this goal, but is unstable. The obvious alternative is to use Read.lines()
with a flat_map
to convert it to a character iterator.
This seems like it should work, but doesn't, resulting in borrowed value does not live long enough
errors.
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let mut lines = stdin.lock().lines();
let mut chars = lines.flat_map(|x| x.unwrap().chars());
}
This is mentioned in Read file character-by-character in Rust, but it does't really explain why.
What I am particularly confused about is how this differs from the example in the documentation for flat_map
, which uses flat_map
to apply .chars()
to a vector of strings. I don't really see how that should be any different. The main difference I see is that my code needs to call unwrap()
as well, but changing the last line to the following does not work either:
let mut chars = lines.map(|x| x.unwrap());
let mut chars = chars.flat_map(|x| x.chars());
It fails on the second line, so the issue doesn't appear to be the unwrap
.
Why does this last line not work, when the very similar line in the documentation doesn't? Is there any way to get this to work?
Start by figuring out what the type of the closure's variable is:
let mut chars = lines.flat_map(|x| {
let () = x;
x.unwrap().chars()
});
This shows it's a Result<String, io::Error>
. After unwrap
ping it, it will be a String
.
Next, look at str::chars
:
fn chars(&self) -> Chars
And the definition of Chars
:
pub struct Chars<'a> {
// some fields omitted
}
From that, we can tell that calling chars
on a string returns an iterator that has a reference to the string.
Whenever we have a reference, we know that the reference cannot outlive the thing that it is borrowed from. In this case, x.unwrap()
is the owner. The next thing to check is where that ownership ends. In this case, the closure owns the String
, so at the end of the closure, the value is dropped and any references are invalidated.
Except the code tried to return a Chars
that still referred to the string. Oops. Thanks to Rust, the code didn't segfault!
The difference with the example that works is all in the ownership. In that case, the strings are owned by a vector outside of the loop and they do not get dropped before the iterator is consumed. Thus there are no lifetime issues.
What this code really wants is an into_chars
method on String
. That iterator could take ownership of the value and return characters.
Not the maximum efficiency, but a good start:
struct IntoChars {
s: String,
offset: usize,
}
impl IntoChars {
fn new(s: String) -> Self {
IntoChars { s: s, offset: 0 }
}
}
impl Iterator for IntoChars {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
let remaining = &self.s[self.offset..];
match remaining.chars().next() {
Some(c) => {
self.offset += c.len_utf8();
Some(c)
}
None => None,
}
}
}
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let lines = stdin.lock().lines();
let chars = lines.flat_map(|x| IntoChars::new(x.unwrap()));
for c in chars {
println!("{}", c);
}
}
See also:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With