Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use Java stream API to summarize data like SQL GROUP BY

With SQL I can write this to summarize data:

SELECT sum(f1), sum(f2), f3, f4 FROM TABLEX GROUP BY f3, f4

This returns a list with 4 values in each row: sum1, sum2, v3, v4

For example: this is what in table:

1, 2, a, b
1, 2, a, b
2, 2, c, d
2, 2, c, d
3, 4, c, d

The result will be:

2, 4, a, b
7, 8, c, d

Now assume that instead of a database table, I have data in a Java List of objects with variables f1,f2,f3,f4. Is there a function in Java stream API that can summarize this list based on f3, f4?

like image 498
Bruce Avatar asked May 22 '15 16:05

Bruce


People also ask

What is the use of stream API in Java?

Introduced in Java 8, the Stream API is used to process collections of objects. A stream is a sequence of objects that supports various methods which can be pipelined to produce the desired result. A stream is not a data structure instead it takes input from the Collections, Arrays or I/O channels.

How does stream API help in processing data?

Java SE 8 introduces the Streams API, which lets you express sophisticated data processing queries. In this article, you've seen that a stream supports many operations such as filter , map , reduce , and iterate that can be combined to write concise and expressive data processing queries.

How do you do Groupby in Java?

The groupingBy() method of Collectors class in Java are used for grouping objects by some property and storing results in a Map instance. In order to use it, we always need to specify a property by which the grouping would be performed. This method provides similar functionality to SQL's GROUP BY clause.

What does Group_by do in SQL?

The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.


2 Answers

Doing this with the JDK's Stream API only isn't really straightforward as other answers have shown. This article explains how you can achieve the SQL semantics of GROUP BY in Java 8 (with standard aggregate functions) and by using jOOλ, a library that extends Stream for these use-cases. This article shows how other SQL clauses can be mapped to Java 8 Streams

Given your SQL query:

SELECT sum(f1), sum(f2), f3, f4 FROM TABLEX GROUP BY f3, f4

You'd first have to create a named tuple X (as also explained in this answer):

public class X {
    final Double f1;
    final Double f2;
    final Object f3;
    final Object f4;

    public X(Double f1, Double f2, Object f3, Object f4) {
        this.f1 = f1;
        this.f2 = f2;
        this.f3 = f3;
        this.f4 = f4;
    }
}

And then, write:

import static org.jooq.lambda.tuple.Tuple.tuple;

import java.util.Map;
import java.util.stream.Collectors;

import org.jooq.lambda.Seq;
import org.jooq.lambda.tuple.Tuple;
import org.jooq.lambda.tuple.Tuple2;

// ...

Map<
    Tuple2<Object, Object>, 
    Tuple2<Double, Double>
> map =

// FROM TABLEX
Seq.of(
    new X(1.0, 2.0, "a", "b"),
    new X(1.0, 2.0, "a", "b"),
    new X(2.0, 2.0, "c", "d"),
    new X(2.0, 2.0, "c", "d"),
    new X(3.0, 4.0, "c", "d"))

// GROUP BY f3, f4
.groupBy(
    x -> tuple(x.f3, x.f4),

// SELECT SUM(f1), SUM(f2)
    Tuple.collectors(
        Collectors.summingDouble(x -> x.f1),
        Collectors.summingDouble(x -> x.f2)
    )
);

A call to

System.out.println(map);

Will then yield

{(c, d)=(7.0, 8.0), (a, b)=(2.0, 4.0)}

Without the jOOλ dependency

If you don't want to create an external dependency, all you have to do is create your own Tuple2 class:

public class Tuple2<T1, T2> {
    final T1 t1;
    final T2 t2;
    // Constructors, toString, equals(), hashCode(), etc.
}

... add your own tuple collector:

static <T, A1, A2, D1, D2> Collector<T, Tuple2<A1, A2>, Tuple2<D1, D2>> collectors(
    Collector<T, A1, D1> collector1
  , Collector<T, A2, D2> collector2
) {
    return Collector.<T, Tuple2<A1, A2>, Tuple2<D1, D2>>of(
        () -> tuple(
            collector1.supplier().get()
          , collector2.supplier().get()
        ),
        (a, t) -> {
            collector1.accumulator().accept(a.v1, t);
            collector2.accumulator().accept(a.v2, t);
        },
        (a1, a2) -> tuple(
            collector1.combiner().apply(a1.v1, a2.v1)
          , collector2.combiner().apply(a1.v2, a2.v2)
        ),
        a -> tuple(
            collector1.finisher().apply(a.v1)
          , collector2.finisher().apply(a.v2)
        )
    );
}

And replace Seq by Stream and groupBy() by collect() and Collectors.groupingBy()

like image 149
Lukas Eder Avatar answered Oct 10 '22 03:10

Lukas Eder


As OP said:

f1,f2 are double values. f3,f4 can be anything.

So let's Consider this class:

public class TABLEX {
   double f1;
   double f2;
   Object f3;
   Object f4;
   //getters and setters
}

Example of list of TABLEX

List<TABLEX> list = new ArrayList<TABLEX>();
list.add(new TABLEX(1, 2, "a", "b"));
list.add(new TABLEX(1, 2, "a", "b"));
list.add(new TABLEX(2, 2, "c", "d"));
list.add(new TABLEX(2, 2, "c", "d"));
list.add(new TABLEX(3, 4, "c", "d"));

Equivalent of SELECT sum(f1) FROM TABLEX GROUP BY f3, f4:

Map<Object, Map<Object, Double>> sumF1 = list.stream().collect(
                Collectors.groupingBy(
                        TABLEX::getF3,
                        Collectors.groupingBy(TABLEX::getF4,
                                Collectors.summingDouble(TABLEX::getF1))));
System.out.println(sumF1);

Output:

{a={b=2.0}, c={d=7.0}}

Equivalent of SELECT sum(f2) FROM TABLEX GROUP BY f3, f4:

Map<Object, Map<Object, Double>> sumF2 = list.stream().collect(
                Collectors.groupingBy(
                        TABLEX::getF3,
                        Collectors.groupingBy(TABLEX::getF4,
                                Collectors.summingDouble(TABLEX::getF2))));
System.out.println(sumF2);

Output:

{a={b=4.0}, c={d=8.0}}

EDIT:

To get the expected result

sumF1.forEach((f3, m) -> m.forEach((f4, sum) -> System.out.println(sum
                + " " + sumF2.get(f3).get(f4) + " " + f3 + " " + f4)));

Output:

2.0 4.0 a b
7.0 8.0 c d
like image 34
MChaker Avatar answered Oct 10 '22 04:10

MChaker