Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any advantage of calling map after mapToInt, where ever required

I am trying to calculate the sum of squares of values in the list. Below are three variations which all calculates the required value. I want to know which one is the most efficient. I am expecting the third one to be more efficient as auto boxing is only done once.

    // sum of squares
    int sum = list.stream().map(x -> x * x).reduce((x, y) -> x + y).get();
    System.out.println("sum of squares: " + sum);

    sum = list.stream().mapToInt(x -> x * x).sum();
    System.out.println("sum of squares: " + sum);

    sum = list.stream().mapToInt(x -> x).map(x -> x * x).sum();
    System.out.println("sum of squares: " + sum);
like image 478
Rob Wilkinson Avatar asked Sep 08 '15 16:09

Rob Wilkinson


People also ask

What's the difference between map and flatMap?

The difference is that the map operation produces one output value for each input value, whereas the flatMap operation produces an arbitrary number (zero or more) values for each input value.

What does flat map do?

flatMap() is an intermediate operation and return a new Stream. It returns a Stream consisting of the results of replacing each element of the given stream with the contents of a mapped stream produced by applying the provided mapping function to each element.

What is the use of map function in Java?

The map() function is a method in the Stream class that represents a functional programming concept. In simple words, the map() is used to transform one object into other by applying a function. That's why the Stream.


2 Answers

When in doubt, test! Using jmh, I get the following results on a list of 100k elements (in microseconds, smaller is better):

Benchmark                        Mode  Samples     Score    Error  Units
c.a.p.SO32462798.for_loop        avgt       10   119.110    0.921  us/op
c.a.p.SO32462798.mapToInt        avgt       10   129.702    1.040  us/op
c.a.p.SO32462798.mapToInt_map    avgt       10   129.753    1.516  us/op
c.a.p.SO32462798.map_reduce      avgt       10  1262.802   12.197  us/op
c.a.p.SO32462798.summingInt      avgt       10   134.821    1.203  us/op

So you have, from faster to slower:

  • for(int i : list) sum += i*i;
  • mapToInt(x -> x * x).sum() and mapToInt(x -> x).map(x -> x * x).sum()
  • collect(Collectors.summingInt(x -> x * x))
  • map(x -> x * x).reduce((x, y) -> x + y).get()

Note that the results are very much dependent on the JIT optimisations. If the logic in the mapping is more complex, some of the optimisations may be unavailable (longer code = less inlining) in which case the streams versions may take 4-5x more time than the for loop - but if that logic is CPU heavy the difference will reduce again. Profiling your actual application will give you more information.


Benchmark code for reference:

@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
public class SO32462798 {

  List<Integer> list;

  @Setup public void setup() {
    list = new Random().ints(100_000).boxed().collect(toList());
  }

  @Benchmark public int for_loop() {
    int sum = 0;
    for (int i : list) sum += i * i;
    return sum;
  }

  @Benchmark public int summingInt() {
    return list.stream().collect(Collectors.summingInt(x -> x * x));
  }

  @Benchmark public int mapToInt() {
    return list.stream().mapToInt(x -> x * x).sum();
  }

  @Benchmark public int mapToInt_map() {
    return list.stream().mapToInt(x -> x).map(x -> x * x).sum();
  }

  @Benchmark public int map_reduce() {
    return list.stream().map(x -> x * x).reduce((x, y) -> x + y).get();
  }
}
like image 117
assylias Avatar answered Sep 21 '22 15:09

assylias


I expect the second one to be the fastest.

There is boxing in neither the second nor the third example (if the list contains already boxed elements). But, there is unboxing.

Your second example might have two unboxing (one for every x in x*x), while the third does unboxing only once. However, unboxing is fast and I think it does not worth to optimize that as a longer pipeline with an additional function call will certainly slow it down.

Sidenote: in general, you should not expect Streams to be faster than regular iterations on arrays or lists. When doing mathematical calculations where speed matters (like this) it's better to go the other way: simply iterate through the elements. If your output is an aggregated value, then aggregate it, if it is a mapping, then allocate a new array or list of the same size and fill it with the calculated values.

like image 35
Tamas Hegedus Avatar answered Sep 21 '22 15:09

Tamas Hegedus