Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala: Can there be any reason to prefer `filter+map` over `collect`?

Tags:

scala

Can there be any reason to prefer filter+map:

list.filter (i => aCondition(i)).map(i => fun(i)) 

over collect? :

list.collect(case i if aCondition(i) => fun(i)) 

The one with collect (single look) looks faster and cleaner to me. So I would always go for collect.

like image 791
Daniel Avatar asked May 01 '16 00:05

Daniel


People also ask

What is filter in Scala map?

Scala Map filter () method with example Last Updated : 13 Aug, 2019 The filter () method is utilized to select all elements of the map which satisfies a stated predicate. Method Definition: def filter (p: ((A, B))=> Boolean): Map [A, B]

What is the difference between'filter'and'mapping'across a collection?

'Mapping' across a collection uses the map function to transform each element of that collection in a similar way. The general syntax is: filter is used when you want to exclude or 'filter out' certain elements of a collection. As with map, the general syntax takes a function, but that function must return a Boolean:

What is the use of filter () method in map?

The filter () method is utilized to select all elements of the map which satisfies a stated predicate. Return Type: It returns a new map consisting all the elements of the map which satisfies the given predicate.

How to filter more than one filter over the same collection?

Using more than more filter over the same collection. If any of the condition fails it returns an null or empty value. We can even pass the map function also over the filtered values to get desired results. This first filters out the data and then a map operation is performed with the filtered data. Using Filter Operation with Map Function.


2 Answers

Most of Scala's collections eagerly apply operations and (unless you're using a macro library that does this for you) will not fuse operations. So filter followed by map will usually create two collections (and even if you use Iterator or somesuch, the intermediate form will be transiently created, albeit only an element at a time), whereas collect will not.

On the other hand, collect uses a partial function to implement the joint test, and partial functions are slower than predicates (A => Boolean) at testing whether something is in the collection.

Additionally, there can be cases where it is simply clearer to read one than the other and you don't care about performance or memory usage differences of a factor of 2 or so. In that case, use whichever is clearer. Generally if you already have the functions named, it's clearer to read

xs.filter(p).map(f) xs.collect{ case x if p(x) => f(x) } 

but if you are supplying the closures inline, collect generally looks cleaner

xs.filter(x < foo(x, x)).map(x => bar(x, x)) xs.collect{ case x if foo(x, x) => bar(x, x) } 

even though it's not necessarily shorter, because you only refer to the variable once.

Now, how big is the difference in performance? That varies, but if we consider a a collection like this:

val v = Vector.tabulate(10000)(i => ((i%100).toString, (i%7).toString)) 

and you want to pick out the second entry based on filtering the first (so the filter and map operations are both really easy), then we get the following table.

Note: one can get lazy views into collections and gather operations there. You don't always get your original type back, but you can always use to get the right collection type. So xs.view.filter(p).map(f).toVector would, because of the view, not create an intermediate. That is tested below also. It has also been suggested that one can xs.flatMap(x => if (p(x)) Some(f(x)) else None) and that this is efficient. That is not so. It's also tested below. And one can avoid the partial function by explicitly creating a builder: val vb = Vector.newBuilder[String]; xs.foreach(x => if (p(x)) vb += f(x)); vb.result, and the results for that are also listed below.

In the table below, three conditions have been tested: filter out nothing, filter out half, filter out everything. The times have been normalized to filter/map (100% = same time as filter/map, lower is better). Error bounds are around +- 3%.

Performance of different filter/map alternatives

====================== Vector ======================== filter/map   collect  view filt/map  flatMap   builder    100%        44%          64%        440%      30%    filter out none    100%        60%          76%        605%      42%    filter out half    100%       112%         103%       1300%      74%    filter out all 

Thus, filter/map and collect are generally pretty close (with collect winning when you keep a lot), flatMap is far slower under all situations, and creating a builder always wins. (This is true specifically for Vector. Other collections may have somewhat different characteristics, but the trends for most will be similar because the differences in operations are similar.) Views in this test tend to be a win, but they don't always work seamlessly (and they aren't really better than collect except for the empty case).

So, bottom line: prefer filter then map if it aids clarity when speed doesn't matter, or prefer it for speed when you're filtering out almost everything but still want to keep things functional (so don't want to use a builder); and otherwise use collect.

like image 144
Rex Kerr Avatar answered Sep 20 '22 23:09

Rex Kerr


I guess this is rather opinion based, but given the following definitions:

scala> val l = List(1,2,3,4) l: List[Int] = List(1, 2, 3, 4)  scala> def f(x: Int) = 2*x f: (x: Int)Int  scala> def p(x: Int) = x%2 == 0 p: (x: Int)Boolean 

Which of the two do you find nicer to read:

l.filter(p).map(f) 

or

l.collect{ case i if p(i) => f(i) } 

(Note that I had to fix your syntax above, as you need the bracket and case to add the if condition).

I personally find the filter+map much nicer to read and understand. It's all a matter of the exact syntax that you use, but given p and f, you don't have to write anonymous functions when using filter or map, while you do need them when using collect.

You also have the possibility to use flatMap:

l.flatMap(i => if(p(i)) Some(f(i)) else None) 

Which is likely to be the most efficient among the 3 solutions, but I find it less nice than map and filter.

Overall, it's very difficult to say which one would be faster, as it depends a lot of which optimizations end up being performed by scalac and then the JVM. All 3 should be pretty close, and definitely not a factor in deciding which one to use.

like image 20
Regis Blanc Avatar answered Sep 19 '22 23:09

Regis Blanc