Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Enum.map is more efficient than Enum.count in Elixir?

I've found that counting with Enum.map |> Enum.sum works much faster than Enum.count. But why is not the built-in count function efficient?

Compare these two lines:

elements |> Enum.count(&check/1)
elements |> Enum.map(&(if check(&1), do: 1, else: 0)) |> Enum.sum

The results of tests with different length of list:

len   | map(), µs | count(), µs
===============================
   10 |      0.67 |    2.55
  100 |      5.52 |    8.91
 1000 |     59.00 |   73.12
10000 |    642.50 |  734.60

Source code: https://gist.github.com/artemrizhov/fc146f7ab390f7a807be833099e5cb83

$ elixir --version
Erlang/OTP 19 [erts-8.1] [source-e7be63d] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]

Elixir 1.3.4
like image 899
raacer Avatar asked Dec 16 '16 01:12

raacer


People also ask

What is Enum map Elixir?

In Elixir, an enumerable is any data type that implements the Enumerable protocol. List s ( [1, 2, 3] ), Map s ( %{foo: 1, bar: 2} ) and Range s ( 1..3 ) are common data types used as enumerables: iex> Enum. map([1, 2, 3], fn x -> x * 2 end) [2, 4, 6] iex> Enum. sum([1, 2, 3]) 6 iex> Enum.

When should I use Elixir stream?

Streams are useful when working with huge, potentially infinite data sets. With large data sets, streams are more suitable as they do not fill up the memory with all the data at once, which Enum would do ​due to its intermediate lists.


Video Answer


1 Answers

This is because both Enum.map/2 and Enum.reduce/3 (that is used by sum) are heavily optimised for lists, while Enum.count/2 has only a generic enumerable version.

To add to confusion, there's even a faster implementation:

elements |> Enum.filter(&check/1) |> Enum.count

On my machine, using modified benchmark you provided I get a consistent result of:

len = 10
map:    2.1706 μs
count:  7.0754 μs
filter: 1.9765 μs

len = 100
map:    19.921 μs
count:  27.579 μs
filter: 14.591 μs

len = 1000
map:    168.93 μs
count:  178.93 μs
filter: 163.43 μs

len = 10000
map:    2128.9 μs
count:  1822.1 μs
filter: 1664.6 μs

That said both the "map" and the "filter" version produce more garbage when they operate, so some of the time gains might be consumed by the extended garbage collection time. This is already visible in the benchmarks, as the relative gains between the versions decrease as the length of the length of the list (and the amount of intermediate garbage produced) increase.

EDIT: I submitted a PR that optimises the Enum.count/2 function to be the fastest one https://github.com/elixir-lang/elixir/pull/5567

like image 170
michalmuskala Avatar answered Oct 19 '22 03:10

michalmuskala