Here's a simple program that reads lines from a file and then splits each line into tokens separated by whitespace. The file may be large, so I'd like the function to return an iterator of Strings:
use std::fs::{File};
use std::io::{BufRead, BufReader};
/// Read tokens from a file
fn read_tokens(filename: &str) -> impl Iterator<Item=String> {
let file = File::open(filename).unwrap();
BufReader::new(file).lines()
.map(|l| l.unwrap())
.flat_map(|l| l.split_whitespace().map(|s| s.to_string()))
}
This fails compilation with
error[E0515]: cannot return value referencing function parameter `l`
--> src/read.rs:8:23
|
8 | .flat_map(|l| l.split_whitespace().map(|s| s.to_string()))
| -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| returns a value referencing data owned by the current function
| `l` is borrowed here
|
= help: use `.collect()` to allocate the iterator
My reading is that str.split_whitespace() returns subslices of the original slice, i.e. references to strings owned by this function. I expected those subslices to be turned into owned Strings by s.to_string(), but that clearly is not happening. Collecting is out of the question because the file may be too large.
Is there anything short of creating a new type that "joins" the two Iterators into a new Iterator<Item=String> manually?
You can avoid a couple of allocations with respect to prog-fhs answer by building your own iterator from the String of each line:
use std::io::{BufRead as _, BufReader};
fn read_tokens() -> impl Iterator<Item = String> {
let reader = BufReader::new("foo bara\nbaz bof".as_bytes());
reader.lines().map_while(Result::ok).flat_map(|mut l| {
std::iter::from_fn(move || {
let mut parts = l.split_whitespace();
let part_len = parts.next()?.len();
// with #![feature(str_split_whitespace_remainder)]
// let rest_start = l.len() - parts.remainder().map_or(0, str::len);
let rest_start = parts.next().map_or(
l.len(),
|next| next.as_ptr() as usize - l.as_ptr() as usize,
);
let rest = l.split_off(rest_start);
l.truncate(part_len); // might want to .shrink_to_fit() to discard excess capacity
Some(std::mem::replace(&mut l, rest))
})
})
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With