Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle Result in flat_map

Tags:

rust

I know we can use collect to move a Result from inner to outer, like:

fn produce_result(my_struct: &MyStruct) -> Result<MyStruct, Error>;

let my_results: Vec<MyStruct> = vec![];
let res = my_results.iter().map(|my_struct| produce_result(&my_struct)).collect::<Result<Vec<MyStruct>, Error>>;

which propagates error from the closure to the outer.

However, this method doesn't work in flat_map case (Rust playground):

fn produce_result(my_struct: &MyStruct) -> Result<Vec<MyStruct>, Error>;

let my_results: Vec<MyStruct> = vec![];
let res = my_results.iter().flat_map(|my_struct| produce_result(&my_struct)).collect::<Result<Vec<MyStruct>, Error>>;

the compiler complains: "a collection of type std::result::Result<std::vec::Vec<MyStruct>, Error> cannot be built from an iterator over elements of type std::vec::Vec<MyStruct>"

How to work around this case?

like image 500
Evian Avatar asked Jan 22 '20 03:01

Evian


1 Answers

flat_map "flattens" the top-layer of the value returned from closure, by calling its IntoIterator implementation. It's important that it doesn't try to reach inside - i.e., if you had your own MyResult, it would error out on flat_map itself:

enum Error {}

enum MyResult<T, U> {
    Ok(T),
    Err(U),
}

struct MyStruct;

fn produce_result(item: &MyStruct) -> MyResult<Vec<MyStruct>, Error> {
    MyResult::Ok(vec![])
}

fn main() {
    let my_structs: Vec<MyStruct> = vec![];
    let res = my_structs
        .iter()
        .flat_map(|my_struct| produce_result(&my_struct))
        .collect::<Result<Vec<MyStruct>, Error>>();
}

(Playground)

Error:

error[E0277]: `MyResult<std::vec::Vec<MyStruct>, Error>` is not an iterator
  --> src/main.rs:18:10
   |
18 |         .flat_map(|my_struct| produce_result(&my_struct))
   |          ^^^^^^^^ `MyResult<std::vec::Vec<MyStruct>, Error>` is not an iterator
   |
   = help: the trait `std::iter::Iterator` is not implemented for `MyResult<std::vec::Vec<MyStruct>, Error>`
   = note: required because of the requirements on the impl of `std::iter::IntoIterator` for `MyResult<std::vec::Vec<MyStruct>, Error>`

In your case, however, the behaviour is different, since Result implements IntoIterator. This iterator yields Ok value unchanged and skips Err, so when flat_mapping the Result, you effectively ignore every error and only use the results of successful calls.

There is a way to fix it, although a but cumbersome. You should explicitly match on the Result, wrapping the Err case in the Vec and "distributing" the Ok case over the already-existing Vec, then let flat_map do its job:

let res = my_structs
    .iter()
    .map(|my_struct| produce_result(&my_struct))
    .flat_map(|result| match result {
        Ok(vec) => vec.into_iter().map(|item| Ok(item)).collect(),
        Err(er) => vec![Err(er)],
    })
    .collect::<Result<Vec<MyStruct>, Error>>();

Playground

There's also another way, which might be more performant if errors are indeed present (even if only sometimes):

fn external_collect(my_structs: Vec<MyStruct>) -> Result<Vec<MyStruct>, Error> {
    Ok(my_structs
        .iter()
        .map(|my_struct| produce_result(&my_struct))
        .collect::<Result<Vec<_>, _>>()?
        .into_iter()
        .flatten()
        .collect())
}

Playground

I've made some quick benchmarking - the code is on the playground, too, although it can't be run there due to the absence of cargo bench command, so I've runned them locally. Here are the results:

test vec_result::external_collect_end_error   ... bench:   2,759,002 ns/iter (+/- 1,035,039)
test vec_result::internal_collect_end_error   ... bench:   3,502,342 ns/iter (+/- 438,603)

test vec_result::external_collect_start_error ... bench:          21 ns/iter (+/- 6)
test vec_result::internal_collect_start_error ... bench:          30 ns/iter (+/- 19)

test vec_result::external_collect_no_error    ... bench:   7,799,498 ns/iter (+/- 815,785)
test vec_result::internal_collect_no_error    ... bench:   3,489,530 ns/iter (+/- 170,124)

It seems that the version with two chained collects takes double time of the method with nested collects if the execution is successful, but is substantionally (by one third, approximately) faster when execution does short-circuit on some error. This result is consistent over multiple benchmark runs, so the large variance reported probably doesn't really matter.

like image 54
Cerberus Avatar answered Sep 21 '22 04:09

Cerberus