Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Average a large array?

I've got what ultimately amounts to a large array of numbers (pulled from a database).

So, it might look like: [1,3,1,2,1,3,1,2,3,1,2,3,1,3,1,3,1,1,3,2,3,3,3,3,1,1,1,1,3,2,1]

Except it could be 50,000 numbers instead a few dozen. The lowest number will always be 1 and highest is 3.

What I need to do is find some sort of rolling average so I can display the data in a manageable line chart.

So maybe average the numbers of every 5-10 data points? Just not sure what the best way to process something like this.

NOTE: Not looking to get a single average. I'm looking to distill the whole array down into a few more average points. So, a data set of 1000 points might be broken down into 10 average numbers.

like image 683
Shpigford Avatar asked Mar 04 '26 20:03

Shpigford


2 Answers

1.9.3p327 :001 > a = [1,3,1,2,1,3,1,2,3,1,2,3,1,3,1,3,1,1,3,2,3,3,3,3,1,1,1,1,3,2,1]
 => [1, 3, 1, 2, 1, 3, 1, 2, 3, 1, 2, 3, 1, 3, 1, 3, 1, 1, 3, 2, 3, 3, 3, 3, 1, 1, 1, 1, 3, 2, 1]
1.9.3p327 :002 > a.each_cons(10).map { |subarray| subarray.reduce(0.0, :+) / subarray.size }
 => [1.8, 1.9, 1.9, 1.9, 2.0, 2.0, 2.0, 2.0, 1.9, 1.9, 2.0, 2.1, 2.1, 2.3, 2.3, 2.3, 2.1, 2.1, 2.1, 2.1, 2.1, 1.9]

This is not good in terms of performance though. It's O(NM), where N is the size of the array, and M is the size of the window (10 in this case).

UPD: Or you can use each_slice, if you need to "reduce" the array size significantly:

1.9.3p327 :002 > a.each_slice(10).map { |subarray| subarray.reduce(0.0, :+) / subarray.size }
 => [1.8, 2.0, 2.1, 1.0]
like image 181
DNNX Avatar answered Mar 07 '26 11:03

DNNX


This sliced average can be acquired directly via the database select. Your database engine is almost certain to do grouping and average calculation at least an order of magnitude faster than ruby, in addition, you will transfer far less data over the wire from your db to your program and dramatically reduce the number of objects instantiated in your ruby program to represent the result set.

So, if your original query looked something like this (in Postgresql):

select value from mytable;

You can modify it to produce average over every ten items like this:

select avg(value) as chunk_avg, row/10 as chunk
from 
  (select value, row_number() over () - 1 as row
   from mytable) x
group by chunk
order by chunk;

SqlFiddle

If you don't want the chunk number in the result, you could wrap this in another outer select that just projects chunk_avg, or drop the chunk field form the select clause and literally replace chunk with row/10 in the group by and order by clauses.

like image 27
dbenhur Avatar answered Mar 07 '26 09:03

dbenhur



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!