Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Streams - Standard Deviation

I wish to clarify upfront I am looking for a way to calculate Standard deviation using Streams (I have a working method at present which calculates & returns SD but without using Streams).

The dataset i am working with matches closely as seen in Link. As shown in this link am able to group my data & get the average but not able to figure out how to get the SD.

Code

outPut.stream()
            .collect(Collectors.groupingBy(e -> e.getCar(),
                    Collectors.averagingDouble(e -> (e.getHigh() - e.getLow()))))
            .forEach((car,avgHLDifference) -> System.out.println(car+ "\t" + avgHLDifference));

I also checked Link on DoubleSummaryStatistics but it doesn't seem to help for SD.

like image 233
iCoder Avatar asked Mar 28 '16 13:03

iCoder


People also ask

How do I calculate standard deviation in Java?

It is the square root of variance. Standard deviation is computed using the formula square root of ∑(Xi - ų)2 / N where Xi is the element of the array, ų is mean of the elements of the array, N is the number of elements, ∑ is the sum of the each element.

How do you find the variance and standard deviation in Java?

//Calculate Standard Deviation double Sum1 = 0; double Sum2 = 0; long count = 0; public Object compute() { if (count > 0) return Math. sqrt(count*Sum2 - Math. pow(Sum1. 2))/ count; else return null; } //Calculate Variance double sum = 0; double count = 0; public Object compute() { if(count>0) return (Math.

Are Java 8 streams lazy?

The Java 8 Streams API is fully based on the 'process only on demand' strategy and hence supports laziness. In the Java 8 Streams API, the intermediate operations are lazy and their internal processing model is optimised to make it being capable of processing the large amount of data with high performance.

Are Java streams more efficient?

Twice as better than a traditional for loop with an index int. Among the Java 8 methods, using parallel streams proved to be more effective.


1 Answers

You can use a custom collector for this task that calculates a sum of square. The buit-in DoubleSummaryStatistics collector does not keep track of it. This was discussed by the expert group in this thread but finally not implemented. The difficulty when calculating the sum of squares is the potential overflow when squaring the intermediate results.

static class DoubleStatistics extends DoubleSummaryStatistics {

    private double sumOfSquare = 0.0d;
    private double sumOfSquareCompensation; // Low order bits of sum
    private double simpleSumOfSquare; // Used to compute right sum for non-finite inputs

    @Override
    public void accept(double value) {
        super.accept(value);
        double squareValue = value * value;
        simpleSumOfSquare += squareValue;
        sumOfSquareWithCompensation(squareValue);
    }

    public DoubleStatistics combine(DoubleStatistics other) {
        super.combine(other);
        simpleSumOfSquare += other.simpleSumOfSquare;
        sumOfSquareWithCompensation(other.sumOfSquare);
        sumOfSquareWithCompensation(other.sumOfSquareCompensation);
        return this;
    }

    private void sumOfSquareWithCompensation(double value) {
        double tmp = value - sumOfSquareCompensation;
        double velvel = sumOfSquare + tmp; // Little wolf of rounding error
        sumOfSquareCompensation = (velvel - sumOfSquare) - tmp;
        sumOfSquare = velvel;
    }

    public double getSumOfSquare() {
        double tmp =  sumOfSquare + sumOfSquareCompensation;
        if (Double.isNaN(tmp) && Double.isInfinite(simpleSumOfSquare)) {
            return simpleSumOfSquare;
        }
        return tmp;
    }

    public final double getStandardDeviation() {
        return getCount() > 0 ? Math.sqrt((getSumOfSquare() / getCount()) - Math.pow(getAverage(), 2)) : 0.0d;
    }

}

Then, you can use this class with

Map<String, Double> standardDeviationMap =
    list.stream()
        .collect(Collectors.groupingBy(
            e -> e.getCar(),
            Collectors.mapping(
                e -> e.getHigh() - e.getLow(),
                Collector.of(
                    DoubleStatistics::new,
                    DoubleStatistics::accept,
                    DoubleStatistics::combine,
                    d -> d.getStandardDeviation()
                )
            )
        ));

This will collect the input list into a map where the values corresponds to the standard deviation of high - low for the same key.

like image 118
Tunaki Avatar answered Sep 19 '22 17:09

Tunaki