Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Excluding extrema from HashSet with a Stream

I've been experimenting with Java 8 streams, is this is best way to remove the min and max scores.

private final Set<MatchScore> scores = new HashSet<>(10);

. . .

public double OPR() {
    return scores.stream()
            .mapToDouble(MatchScore::getScore)
            .filter((num) -> { //Exclude min and max score
                return num != scores.stream()
                                    .mapToDouble(MatchScore::getScore)
                                    .max().getAsDouble() 
                        && 
                       num != scores.stream()
                                    .mapToDouble(MatchScore::getScore)
                                    .min().getAsDouble();
            })
            .average().getAsDouble();
}
like image 609
jpdymond Avatar asked Dec 09 '22 08:12

jpdymond


2 Answers

A simpler approach would be:

return scores.stream()
        .mapToDouble(MatchScore::getScore)
        .sorted()
        .skip(1)
        .limit(scores.size() - 2)
        .average().getAsDouble();

Note: that works because elements in a set are unique - with a list there could be more than one element equal to the min or max score.


Performance wise*, the skip/limit is significantly faster on a small set of 10 elements (the Mean column shows the average time taken by a sample call, in nanoseconds):

Benchmark                      Mode   Samples         Mean   Mean error    Units
c.a.p.SO22923505.maxMin        avgt         5     6996.190      284.287    ns/op
c.a.p.SO22923505.skipLimit     avgt         5      479.935        4.547    ns/op

*using jmh - and here is the source code for the tests.

like image 74
assylias Avatar answered Dec 10 '22 22:12

assylias


One can use DoubleSummaryStatistics to collect the required information in a single pass over the data, and then subtract out the minimum and maximum:

@GenerateMicroBenchmark
public double summaryStats() {
    DoubleSummaryStatistics stats =
        scores.stream()
              .collect(Collectors.summarizingDouble(Double::doubleValue));

    if (stats.getCount() == 0L) {
        return 0.0; // or something
    } else {
        return (stats.getSum() - stats.getMin() - stats.getMax()) / (stats.getCount() - 2);
    }
}

Adding this code to assylias' benchmark code gives me the following results. Although my machine is slower overall, the relative performance of using DoubleSummaryStatistics over a single pass is faster.

Benchmark                         Mode   Samples         Mean   Mean error    Units
c.a.p.SO22923505.maxMin           avgt         5     9629.166     1051.585    ns/op
c.a.p.SO22923505.skipLimit        avgt         5      682.221       80.504    ns/op
c.a.p.SO22923505.summaryStats     avgt         5      412.740       85.372    ns/op
like image 37
Stuart Marks Avatar answered Dec 10 '22 20:12

Stuart Marks