Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 Lambda groupingBy X and Y simultaneously

I'm looking for a lambda to refine the data already retrieved. I have a raw resultset, if the user do not change the date I want use java's lambda to group by the results for then. And I'm new to lambdas with java.

The lambda I'm looking for works simliar to this query.

select z, w, min(x), max(x), avg(x), min(y), max(y), avg(y) from table group by x, w;
like image 881
Hugo Prudente Avatar asked Jan 21 '15 19:01

Hugo Prudente


2 Answers

So I'm assuming you have a List of objects and you want to create a map with the given groupings. I am a bit confused by your x, y, w, z so I'll use my own fields. But Here's how I would do it:

interface Entry {
    String getGroup1();
    String getGroup2();
    int getIntData();
    double getDoubleData();
}

List<Entry> dataList;
Map<String, Map<String, IntSummaryStatistics>> groupedStats = 
    dataList.stream()
        .collect(Collectors.groupingBy(Entry::getGroup1,
            Collectors.groupingBy(Entry::getGroup2,
                Collectors.summarizingInt(Entry::getIntData))));

Then if you want to get, say, the average of data for items with groups A, B then you use:

groupedStats.get("A").get("B").getAverage();

If you want to summarise more than one set of data simultaneously then it gets a bit more complicated. You need to write your own wrapper class that can accumulate multiple statistics. Here's an example with both data items in Entry (I made them an int and a double to make it a bit more interesting).

class CompoundStats {
    private final IntSummaryStatistics intDataStats = new IntSummaryStatistics();
    private final DoubleSummaryStatistics doubleDataStats = new DoubleSummaryStatistics();

    public void add(Entry entry) {
        intDataStats.accept(entry.getIntData());
        doubleDataStats.accept(entry.getDoubleData());
    }

    public CompoundStats combine(CompoundStats other) {
        intDataStats.combine(other.intDataStats);
        doubleDataStats.combine(other.doubleDataStats);
        return this;
    }
}

This class can then be used to create your own collector:

Map<String, Map<String, CompoundStats>> groupedStats = 
    dataList.stream()
        .collect(Collectors.groupingBy(Entry::getGroup1,
            Collectors.groupingBy(Entry::getGroup2,
                Collector.of(CompoundStats::new, CompoundStats::add, CompoundStats::combine))));

Now your maps return a CompoundStats instead of an IntSummaryStatistics:

groupedStats.get("A").get("B").getDoubleStats().getAverage();

Also note that this would be neater if you created a separate class to hold your groupings rather than using the two step map I've proposed above. Again not a difficult modification if required.

Hopefully this is useful in your own case.

like image 170
sprinter Avatar answered Nov 01 '22 11:11

sprinter


I'm going to be using the Tuple2 type from jOOλ for this exercise, but you can also create your own tuple type if you want to avoid the dependency.

I'm also assuming you're using this to represent your data:

class A {
    final int w;
    final int x;
    final int y;
    final int z;

    A(int w, int x, int y, int z) {
        this.w = w;
        this.x = x;
        this.y = y;
        this.z = z;
    }
}

You can now write:

Map<Tuple2<Integer, Integer>, Tuple2<IntSummaryStatistics, IntSummaryStatistics>> map =
Stream.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))
.collect(Collectors.groupingBy(

    // This is your GROUP BY criteria
    a -> tuple(a.z, a.w),
    Collector.of(

        // When collecting, we'll aggregate data into two IntSummaryStatistics
        // for x and y
        () -> tuple(new IntSummaryStatistics(), new IntSummaryStatistics()),

        // The accumulator will simply take new t = (x, y) values
        (r, t) -> {
            r.v1.accept(t.x);
            r.v2.accept(t.y);
        },

        // The combiner will merge two partial aggregations,
        // in case this is executed in parallel
        (r1, r2) -> {
            r1.v1.combine(r2.v1);
            r1.v2.combine(r2.v2);

            return r1;
        }
    )
));

Or even better (using the latest jOOλ API):

Map<Tuple2<Integer, Integer>, Tuple2<IntSummaryStatistics, IntSummaryStatistics>> map =

// Seq is like a Stream, but sequential only, and with more features
Seq.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))

// Seq.groupBy() is just short for Stream.collect(Collectors.groupingBy(...))
.groupBy(
    a -> tuple(a.z, a.w),

    // Because once you have tuples, why not add tuple-collectors?
    Tuple.collectors(
        Collectors.summarizingInt(a -> a.x),
        Collectors.summarizingInt(a -> a.y)
    )
);

The map structure is now:

(z, w) -> (all_aggregations_of(x), all_aggregations_of(y))

Calling toString() on the above map will produce:

{
    (1, 1) = (IntSummaryStatistics{count=2, sum=3, min=1, average=1.500000, max=2}, 
              IntSummaryStatistics{count=2, sum=4, min=1, average=2.000000, max=3}), 
    (4, 9) = (IntSummaryStatistics{count=2, sum=17, min=8, average=8.500000, max=9}, 
              IntSummaryStatistics{count=2, sum=13, min=6, average=6.500000, max=7}), 
    (5, 2) = (IntSummaryStatistics{count=3, sum=12, min=3, average=4.000000, max=5}, 
              IntSummaryStatistics{count=3, sum=13, min=4, average=4.333333, max=5})
}

You got all your statistics now.

Side note

Of course, I don't know your exact requirements, but I suspect you'll be quickly needing more sophisticated aggregations in your report, such as medians, inverse distribution, and all sorts of nice OLAP features, which is when you realise that SQL is just a much easier language for this kind of task.

On the other hand, we'll definitely add more SQLesque features to jOOλ. This topic has also inspired me to write a full blog post with more details about the described approach.

like image 4
Lukas Eder Avatar answered Nov 01 '22 11:11

Lukas Eder