Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

flat_map'ing non-consuming iterator into consuming iterator in Rust

Tags:

rust

Here's a simple program that reads lines from a file and then splits each line into tokens separated by whitespace. The file may be large, so I'd like the function to return an iterator of Strings:

use std::fs::{File};
use std::io::{BufRead, BufReader};
/// Read tokens from a file
fn read_tokens(filename: &str) -> impl Iterator<Item=String> {
    let file = File::open(filename).unwrap();
    BufReader::new(file).lines()
        .map(|l| l.unwrap())
        .flat_map(|l| l.split_whitespace().map(|s| s.to_string()))
}

This fails compilation with

error[E0515]: cannot return value referencing function parameter `l`
 --> src/read.rs:8:23
  |
8 |         .flat_map(|l| l.split_whitespace().map(|s| s.to_string()))
  |                       -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |                       |
  |                       returns a value referencing data owned by the current function
  |                       `l` is borrowed here
  |
  = help: use `.collect()` to allocate the iterator

My reading is that str.split_whitespace() returns subslices of the original slice, i.e. references to strings owned by this function. I expected those subslices to be turned into owned Strings by s.to_string(), but that clearly is not happening. Collecting is out of the question because the file may be too large.

Is there anything short of creating a new type that "joins" the two Iterators into a new Iterator<Item=String> manually?

like image 313
Igor Urisman Avatar asked Apr 24 '26 21:04

Igor Urisman


1 Answers

You can avoid a couple of allocations with respect to prog-fhs answer by building your own iterator from the String of each line:

use std::io::{BufRead as _, BufReader};

fn read_tokens() -> impl Iterator<Item = String> {
    let reader = BufReader::new("foo bara\nbaz bof".as_bytes());
    reader.lines().map_while(Result::ok).flat_map(|mut l| {
        std::iter::from_fn(move || {
            let mut parts = l.split_whitespace();
            let part_len = parts.next()?.len();
            // with #![feature(str_split_whitespace_remainder)]
            // let rest_start = l.len() - parts.remainder().map_or(0, str::len);
            let rest_start = parts.next().map_or(
                l.len(),
                |next| next.as_ptr() as usize - l.as_ptr() as usize,
            );
            let rest = l.split_off(rest_start);
            l.truncate(part_len); // might want to .shrink_to_fit() to discard excess capacity
            Some(std::mem::replace(&mut l, rest))
        })
    })
}
like image 100
cafce25 Avatar answered Apr 26 '26 11:04

cafce25



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!