Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Statistics on large stream of Integers in java

Tags:

java

arraylist

I am reading a huge number of Integers from a file, and at the end I want to get some basic statistics from these Integers(median,mean, 25thpercentile, 75thpercentile, etc). I could calculate some of these statistics numbers on the go, but it seems to me that calculating the 25th/75th percentile would be complicated. The simplest approach, I think, would be to place the Integers in a list and do the statistics from that list. However, since the list is so large it could slow down the program for using so much memory. Do you guys have any suggestions? This is sort of how I acquire the data and the two options I thought of:

Scanner input = new Scanner(new File("name"));
ArrayList<Integer> lits= new ArrayList<Integer>();
while(input.hasNextLine()){
  list.add(Integer.parseInt(input.nextLine()));
}
doStatistics(list);

OR

Scanner input = new Scanner(new File("name"));
while(input.hasNextLine()){
   //I dont know how I would acomplish this for the percentile stats
   acqquireStats(Integer.parseInt(input.nextLine()));
}
like image 420
Julio Diaz Avatar asked Jun 14 '12 17:06

Julio Diaz


2 Answers

Given that the number of values is significantly smaller than the number of samples, it makes more sense to store the number per value than the reverse.

Long[] samples = new Long[101];

while(input.hasNextLine()){
    try{
      samples[Math.max(0, Math.min(100, Integer.parseInt(input.nextLine())))];
    } catch (ParseException e){/*not a number*/}
}

This leaves you with a huge set of data represented by just a tiny array.

like image 113
Andrew Avatar answered Oct 19 '22 09:10

Andrew


This article and John D. Cook are your best bets:

http://www.codeproject.com/Articles/33781/Calculate-Percentiles-in-O-1-space-and-O-n-time

like image 38
duffymo Avatar answered Oct 19 '22 09:10

duffymo