Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do I need to collect into a vector when using `flat_map`?

I'm working on Project Euler 96 to teach myself Rust. I've written this code to read in the file and convert it into a vector of integers (Playground).

let file = File::open(&args[1]).expect("Sudoku file not found");
let reader = BufReader::new(file);

let x = reader
    .lines()
    .map(|x| x.unwrap())
    .filter(|x| !x.starts_with("Grid"))
    .flat_map(|s| s.chars().collect::<Vec<_>>())  // <-- collect here!
    .map(|x| x.to_digit(10).unwrap())
    .collect::<Vec<_>>();

This all works fine but I'm puzzled why I have to collect into a vector in my flat_map (I'm assuming creating unneeded vectors which will be immediately destroyed is inefficient). If I don't collect, it doesn't compile:

error[E0515]: cannot return value referencing function parameter `s`
  --> src/main.rs:13:23
   |
13 |         .flat_map(|s| s.chars())
   |                       -^^^^^^^^
   |                       |
   |                       returns a value referencing data owned by the current function
   |                       `s` is borrowed here

The example from the docs shows almost the same code, but a collect is not required:

let words = ["alpha", "beta", "gamma"];

// chars() returns an iterator
let merged: String = words.iter()
                          .flat_map(|s| s.chars())
                          .collect();
assert_eq!(merged, "alphabetagamma");

So why is it different in my code?

like image 653
Nick Long Avatar asked Sep 12 '19 08:09

Nick Long


2 Answers

The iterator reader.lines().map(|x| x.unwrap()) iterates over String items, i.e. by value. Consequently, in .flat_map(|s| ...), the variable s has the type String (i.e. owned, not borrowed). In other words: the string is a local variable now and lives in the function. It's a simple rule that you cannot return references to local variables (see this Q&A). But that's exactly what s.chars() does, even if it's a bit hidden.

Taking a look at str::chars:

pub fn chars(&self) -> Chars<'_>

One can see that the string is borrowed. The returned Chars object contains a reference to the original string. That's why we cannot return s.chars() from the closure.

So why is it different in my code?

In the documentation's example, the iterator words.iter() actually iterates over items of the type &&'static str. Calling s.chars() will also return a Chars object that borrows some string, but that string's lifetime is 'static (lives forever), so there is no problem with returning Chars from that closure.

Solution?

It would be great if the standard library had an OwnedChars iterator that consumes a String, works like Chars and drops the string once the iterator is dropped. In that case it's fine to call s.owned_chars() because the returned object does not reference the local s, but owns it. But: such an owned iterator does not exist in the standard library!

I'm assuming creating unneeded vectors which will be immediately destroyed is inefficient

Yes, that is true in a way. But you might have missed that the reader.lines() iterator also creates temporary objects of type String. Those are more or less immediately destroyed as well! So even without the collect in the flat_map you have a bunch of unnecessary allocations. Note that sometimes that's Ok. In this case, I guess that input parsing is very fast in comparison to the actual algorithm you have to implement. So ... just collect? It's probably fine in this case.

If you want to have high performance input parsing, I think you will not be able to avoid a standard loop, in particular in order to avoid unnecessary String allocations. (Playground)

let mut line = String::new();
let mut input = Vec::new();
loop {
    line.clear(); // clear contents, but keep memory buffer

    // TODO: handle IO error properly
    let bytes_read = reader.read_line(&mut line).expect("IO error"); 
    if bytes_read == 0 {
        break;
    }

    if line.starts_with("Grid") {
        continue;
    }

    // TODO: handle invalid input error
    input.extend(line.trim().chars().map(|c| c.to_digit(10).unwrap()));
}
like image 116
Lukas Kalbertodt Avatar answered Sep 30 '22 15:09

Lukas Kalbertodt


In addition to the other answer, note that an owned iterator is easy to write:

struct OwnedChars {
    s: String,
    i: usize,
}

impl Iterator for OwnedChars {
    type Item = char;

    fn next(&mut self) -> Option<Self::Item> {
        let c = self.s[self.i..].chars().next()?;

        self.i += c.len_utf8();

        Some(c)
    }
}

fn into_iter(string: String) -> OwnedChars {
    OwnedChars {
        s: string,
        i: 0,
    }
}

fn main() {
    let owned_iter = into_iter("Zażółć gęślą jaźń".into());

    for c in owned_iter {
        println!("{}", c);
    }
}

And then, you don't need to collect:

fn main() {
    use std::io::prelude::*;

    let file = std::fs::File::open(std::env::args().nth(1).unwrap()).expect("Sudoku file not found");
    let reader = std::io::BufReader::new(file);

    let x = reader
        .lines()
        .map(|x| x.unwrap())
        .filter(|x| !x.starts_with("Grid"))
        .flat_map(into_iter)
        .map(|x| x.to_digit(10).unwrap())
        .collect::<Vec<_>>();
}
like image 37
Boiethios Avatar answered Sep 30 '22 13:09

Boiethios