I've got what ultimately amounts to a large array of numbers (pulled from a database).
So, it might look like: [1,3,1,2,1,3,1,2,3,1,2,3,1,3,1,3,1,1,3,2,3,3,3,3,1,1,1,1,3,2,1]
Except it could be 50,000 numbers instead a few dozen. The lowest number will always be 1 and highest is 3.
What I need to do is find some sort of rolling average so I can display the data in a manageable line chart.
So maybe average the numbers of every 5-10 data points? Just not sure what the best way to process something like this.
NOTE: Not looking to get a single average. I'm looking to distill the whole array down into a few more average points. So, a data set of 1000 points might be broken down into 10 average numbers.
1.9.3p327 :001 > a = [1,3,1,2,1,3,1,2,3,1,2,3,1,3,1,3,1,1,3,2,3,3,3,3,1,1,1,1,3,2,1]
=> [1, 3, 1, 2, 1, 3, 1, 2, 3, 1, 2, 3, 1, 3, 1, 3, 1, 1, 3, 2, 3, 3, 3, 3, 1, 1, 1, 1, 3, 2, 1]
1.9.3p327 :002 > a.each_cons(10).map { |subarray| subarray.reduce(0.0, :+) / subarray.size }
=> [1.8, 1.9, 1.9, 1.9, 2.0, 2.0, 2.0, 2.0, 1.9, 1.9, 2.0, 2.1, 2.1, 2.3, 2.3, 2.3, 2.1, 2.1, 2.1, 2.1, 2.1, 1.9]
This is not good in terms of performance though. It's O(NM), where N is the size of the array, and M is the size of the window (10 in this case).
UPD: Or you can use each_slice, if you need to "reduce" the array size significantly:
1.9.3p327 :002 > a.each_slice(10).map { |subarray| subarray.reduce(0.0, :+) / subarray.size }
=> [1.8, 2.0, 2.1, 1.0]
This sliced average can be acquired directly via the database select. Your database engine is almost certain to do grouping and average calculation at least an order of magnitude faster than ruby, in addition, you will transfer far less data over the wire from your db to your program and dramatically reduce the number of objects instantiated in your ruby program to represent the result set.
So, if your original query looked something like this (in Postgresql):
select value from mytable;
You can modify it to produce average over every ten items like this:
select avg(value) as chunk_avg, row/10 as chunk
from
(select value, row_number() over () - 1 as row
from mytable) x
group by chunk
order by chunk;
SqlFiddle
If you don't want the chunk number in the result, you could wrap this in another outer select that just projects chunk_avg, or drop the chunk field form the select clause and literally replace chunk with row/10 in the group by and order by clauses.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With