Moving on from https://doc.rust-lang.org/rust-by-example/std_misc/file/read_lines.html, I would like to define a function that accepts an iterable of Paths, and returns a Reader that wraps all the paths into a single stream, my non-compilable attempt,
fn read_lines<P, I: IntoIterator<Item = P>>(files: I) -> Result<io::Lines<io::BufReader<File>>>
where
P: AsRef<Path>,
{
let handles = files.into_iter()
.map(|path|
File::open(path).unwrap());
// I guess it is hard (impossible?) to define the type of this reduction,
// Chain<File, Chain<File, ..., Chain<File, File>>>
// and that is the reason the compiler is complaining.
match handles.reduce(|a, b| a.chain(b)) {
Some(combination) => Ok(BufReader::new(combination).lines()),
None => {
// Not nice, hard fail if the array len is 0
Ok(BufReader::new(handles.next().unwrap()).lines())
},
}
}
This gives an expected error, which I am unsure how to address,
error[E0599]: the method `chain` exists for struct `File`, but its trait bounds were not satisfied
--> src/bin.rs:136:35
|
136 | match handles.reduce(|a, b| a.chain(b)) {
| ^^^^^ method cannot be called on `File` due to unsatisfied trait bounds
|
::: /home/test/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/fs.rs:91:1
|
91 | pub struct File {
| --------------- doesn't satisfy `File: Iterator`
|
::: /home/test/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/mod.rs:902:8
|
902 | fn chain<R: Read>(self, next: R) -> Chain<Self, R>
| ----- the method is available for `Box<File>` here
|
= note: the following trait bounds were not satisfied:
`File: Iterator`
which is required by `&mut File: Iterator`
= help: items from traits can only be used if the trait is in scope
help: the following trait is implemented but not in scope; perhaps add a `use` for it:
|
1 | use std::io::Read;
|
error: aborting due to previous error
I've tried contorting the code with Box's without success, but it seems the fundamental issue is that the type of this reduction is "undefined": Chain<File, Chain<File, ..., Chain<File, File>>> IIUC. How would a Rustacean define a method like this? Is it possible without using dynamic "boxes"?
I guess it is hard (impossible?) to define the type of this reduction,
Chain<File, Chain<File, ..., Chain<File, File>>>. [...] How would a Rustacean define a method like this?
The combinator you are looking for is flat_map:
let handles = files.into_iter().map(|path| File::open(path).unwrap());
handles.flat_map(|handle| BufReader::new(handle).lines())
Also, your return type is unnecessarily specific, committing to a particular implementation of both the iterator over the handles and the iterator over the lines coming from a handle. Even if you get it to work, the signature of your function will be tightly coupled to its implementation, meaning you won't be able to to e.g. switch to a more efficient approach without introducing a breaking change to the API.
To avoid such coupling, you can use an impl Trait return type. That way the signature of your function only promises that the type of the returned value will implement Iterator. The function could then look like this:
fn read_lines<P, I: IntoIterator<Item = P>>(files: I) -> impl Iterator<Item = io::Result<String>>
where
P: AsRef<Path>,
{
let handles = files.into_iter().map(|path| File::open(path).unwrap());
handles.flat_map(|handle| BufReader::new(handle).lines())
}
Finally, if you really want to combine reduce and chain, you can do that too. Your intuition that you need to use a Box is correct, but it is much easier to use fold() than reduce():
handles.fold(
Box::new(std::iter::empty()) as Box<dyn Iterator<Item = _>>,
|iter, handle| Box::new(iter.chain(BufReader::new(handle).lines())),
)
Folding starts with an empty iterator, boxed and cast to a trait object, and proceeds to chain lines of each handle to the end of the previous iterator chain. Each result of the chain is boxed so that its type is erased to Box<dyn Iterator<Item = io::Result<String>>>, which eliminates the recursion on the type level. The return type of the function can be either impl Iterator or Box<dyn Iterator>, both will compile.
Note that this solution is inefficient, not just due to boxing, but also because the final iterator will wrap all the previous ones. Although the recursion is not visible from the erased types, it's there in the implementation, and the final next() will internally have to go through all the stacked iterators, possibly even blowing up the stack if there is a sufficient number of files. The solution based on flat_map() doesn't have that issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With