I am reading a huge number of Integers from a file, and at the end I want to get some basic statistics from these Integers(median,mean, 25thpercentile, 75thpercentile, etc). I could calculate some of these statistics numbers on the go, but it seems to me that calculating the 25th/75th percentile would be complicated. The simplest approach, I think, would be to place the Integers in a list and do the statistics from that list. However, since the list is so large it could slow down the program for using so much memory. Do you guys have any suggestions? This is sort of how I acquire the data and the two options I thought of:
Scanner input = new Scanner(new File("name"));
ArrayList<Integer> lits= new ArrayList<Integer>();
while(input.hasNextLine()){
list.add(Integer.parseInt(input.nextLine()));
}
doStatistics(list);
OR
Scanner input = new Scanner(new File("name"));
while(input.hasNextLine()){
//I dont know how I would acomplish this for the percentile stats
acqquireStats(Integer.parseInt(input.nextLine()));
}
Given that the number of values is significantly smaller than the number of samples, it makes more sense to store the number per value than the reverse.
Long[] samples = new Long[101];
while(input.hasNextLine()){
try{
samples[Math.max(0, Math.min(100, Integer.parseInt(input.nextLine())))];
} catch (ParseException e){/*not a number*/}
}
This leaves you with a huge set of data represented by just a tiny array.
This article and John D. Cook are your best bets:
http://www.codeproject.com/Articles/33781/Calculate-Percentiles-in-O-1-space-and-O-n-time
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With