Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does .flat_map() with .chars() not work with std::io::Lines, but does with a vector of Strings?

Tags:

iterator

rust

I am trying to iterate over characters in stdin. The Read.chars() method achieves this goal, but is unstable. The obvious alternative is to use Read.lines() with a flat_map to convert it to a character iterator.

This seems like it should work, but doesn't, resulting in borrowed value does not live long enough errors.

use std::io::BufRead;

fn main() {
    let stdin = std::io::stdin();
    let mut lines = stdin.lock().lines();
    let mut chars = lines.flat_map(|x| x.unwrap().chars());
}

This is mentioned in Read file character-by-character in Rust, but it does't really explain why.

What I am particularly confused about is how this differs from the example in the documentation for flat_map, which uses flat_map to apply .chars() to a vector of strings. I don't really see how that should be any different. The main difference I see is that my code needs to call unwrap() as well, but changing the last line to the following does not work either:

let mut chars = lines.map(|x| x.unwrap());
let mut chars = chars.flat_map(|x| x.chars());

It fails on the second line, so the issue doesn't appear to be the unwrap.

Why does this last line not work, when the very similar line in the documentation doesn't? Is there any way to get this to work?

like image 962
Ian D. Scott Avatar asked Nov 01 '16 00:11

Ian D. Scott


1 Answers

Start by figuring out what the type of the closure's variable is:

let mut chars = lines.flat_map(|x| {
    let () = x;
    x.unwrap().chars()
});

This shows it's a Result<String, io::Error>. After unwrapping it, it will be a String.

Next, look at str::chars:

fn chars(&self) -> Chars

And the definition of Chars:

pub struct Chars<'a> {
    // some fields omitted
}

From that, we can tell that calling chars on a string returns an iterator that has a reference to the string.

Whenever we have a reference, we know that the reference cannot outlive the thing that it is borrowed from. In this case, x.unwrap() is the owner. The next thing to check is where that ownership ends. In this case, the closure owns the String, so at the end of the closure, the value is dropped and any references are invalidated.

Except the code tried to return a Chars that still referred to the string. Oops. Thanks to Rust, the code didn't segfault!

The difference with the example that works is all in the ownership. In that case, the strings are owned by a vector outside of the loop and they do not get dropped before the iterator is consumed. Thus there are no lifetime issues.

What this code really wants is an into_chars method on String. That iterator could take ownership of the value and return characters.


Not the maximum efficiency, but a good start:

struct IntoChars {
    s: String,
    offset: usize,
}

impl IntoChars {
    fn new(s: String) -> Self {
        IntoChars { s: s, offset: 0 }
    }
}

impl Iterator for IntoChars {
    type Item = char;

    fn next(&mut self) -> Option<Self::Item> {
        let remaining = &self.s[self.offset..];

        match remaining.chars().next() {
            Some(c) => {
                self.offset += c.len_utf8();
                Some(c)
            }
            None => None,
        }
    }
}

use std::io::BufRead;

fn main() {
    let stdin = std::io::stdin();
    let lines = stdin.lock().lines();
    let chars = lines.flat_map(|x| IntoChars::new(x.unwrap()));

    for c in chars {
        println!("{}", c);
    }
}

See also:

  • How can I store a Chars iterator in the same struct as the String it is iterating on?
  • Is there an owned version of String::chars?
like image 170
Shepmaster Avatar answered Nov 15 '22 07:11

Shepmaster