Average a large array?

Question

I've got what ultimately amounts to a large array of numbers (pulled from a database).

So, it might look like: [1,3,1,2,1,3,1,2,3,1,2,3,1,3,1,3,1,1,3,2,3,3,3,3,1,1,1,1,3,2,1]

Except it could be 50,000 numbers instead a few dozen. The lowest number will always be 1 and highest is 3.

What I need to do is find some sort of rolling average so I can display the data in a manageable line chart.

So maybe average the numbers of every 5-10 data points? Just not sure what the best way to process something like this.

NOTE: Not looking to get a single average. I'm looking to distill the whole array down into a few more average points. So, a data set of 1000 points might be broken down into 10 average numbers.

DNNX · Accepted Answer

1.9.3p327 :001 > a = [1,3,1,2,1,3,1,2,3,1,2,3,1,3,1,3,1,1,3,2,3,3,3,3,1,1,1,1,3,2,1]
 => [1, 3, 1, 2, 1, 3, 1, 2, 3, 1, 2, 3, 1, 3, 1, 3, 1, 1, 3, 2, 3, 3, 3, 3, 1, 1, 1, 1, 3, 2, 1]
1.9.3p327 :002 > a.each_cons(10).map { |subarray| subarray.reduce(0.0, :+) / subarray.size }
 => [1.8, 1.9, 1.9, 1.9, 2.0, 2.0, 2.0, 2.0, 1.9, 1.9, 2.0, 2.1, 2.1, 2.3, 2.3, 2.3, 2.1, 2.1, 2.1, 2.1, 2.1, 1.9]

This is not good in terms of performance though. It's O(NM), where N is the size of the array, and M is the size of the window (10 in this case).

UPD: Or you can use each_slice, if you need to "reduce" the array size significantly:

1.9.3p327 :002 > a.each_slice(10).map { |subarray| subarray.reduce(0.0, :+) / subarray.size }
 => [1.8, 2.0, 2.1, 1.0]

dbenhur · Answer

This sliced average can be acquired directly via the database select. Your database engine is almost certain to do grouping and average calculation at least an order of magnitude faster than ruby, in addition, you will transfer far less data over the wire from your db to your program and dramatically reduce the number of objects instantiated in your ruby program to represent the result set.

So, if your original query looked something like this (in Postgresql):

select value from mytable;

You can modify it to produce average over every ten items like this:

select avg(value) as chunk_avg, row/10 as chunk
from 
  (select value, row_number() over () - 1 as row
   from mytable) x
group by chunk
order by chunk;

SqlFiddle

If you don't want the chunk number in the result, you could wrap this in another outer select that just projects chunk_avg, or drop the chunk field form the select clause and literally replace chunk with row/10 in the group by and order by clauses.

Average a large array?

Tags:

arrays

ruby

postgresql

average

Shpigford

2 Answers

DNNX

dbenhur

Recent Activity

Donate For Us

Average a large array?

Tags:

arrays

ruby

postgresql

average

Shpigford

2 Answers

DNNX

dbenhur

Related questions

Recent Activity

Donate For Us