Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate functions over arrays

I have a table like this:

+-----+----------------+
| ID  |  array300      |
+-----+----------------+
| 100 | {110,25,53,..} |
| 101 | {56,75,59,...} |
| 102 | {65,93,82,...} |
| 103 | {75,70,80,...} |
+-----+----------------+

array300 column is an array of 300 elements. I need to have arrays of 100 elements with every element representing the average of 3 elements of array300. For this example the answer will be like:
array100
{62.66,...}
{63.33,...}
{80,...}
{78.33,...}

like image 313
f.ashouri Avatar asked Dec 10 '12 15:12

f.ashouri


People also ask

What does ARRAY_AGG do in SQL?

The ARRAY_AGG() accepts an expression that returns a value of any type which is valid for an array element. The ORDER BY clause is an optional clause. It specifies the order of rows processed in the aggregation, which determines the order of the elements in the result array.

When would you use an aggregate function?

An aggregate function performs a calculation on a set of values, and returns a single value. Except for COUNT(*) , aggregate functions ignore null values. Aggregate functions are often used with the GROUP BY clause of the SELECT statement.

What are the 5 aggregate functions?

There are five aggregate functions, which are: MIN, MAX, COUNT, SUM, and AVG.

What is array aggregation?

The ARRAY_AGG function aggregates a set of elements into an array. Invocation of the ARRAY_AGG aggregate function is based on the result array type.


2 Answers

Try something like this:

SELECT id, unnest(array300) as val, ntile(100) OVER (PARTITION BY id) as bucket_num
FROM your_table

This SELECT will give you 300 records per array300 with same id and assing them the bucket_num (1 for firs 3 elements, 2 for next 3, and so on).

Then use this select to get the avg of elements in the bucket:

SELECT id, avg(val) as avg_val
FROM (...previous select here...)
GROUP BY id, bucket_num

Next - just aggregate the avg_val into array:

SELECT id, array_agg(avg_val) as array100
FROM (...previous select here...)
GROUP BY id

Details: unnest , ntile , array_agg , OVER (PARTITION BY )

UPD: Try this function:

CREATE OR REPLACE FUNCTION public.array300_to_100 (
  p_array300 numeric []
)
RETURNS numeric [] AS
$body$
DECLARE
  dim_start int = array_length(p_array300, 1); --size of input array
  dim_end int = 100; -- size of output array
  dim_step int = dim_start / dim_end; --avg batch size
  tmp_sum NUMERIC; --sum of the batch
  result_array NUMERIC[100]; -- resulting array
BEGIN

  FOR i IN 1..dim_end LOOP --from 1 to 100.
    tmp_sum = 0;

    FOR j IN (1+(i-1)*dim_step)..i*dim_step LOOP --from 1 to 3, 4 to 6, ...
      tmp_sum = tmp_sum + p_array300[j];  
    END LOOP; 

    result_array[i] = tmp_sum / dim_step;
  END LOOP; 

  RETURN result_array;
END;
$body$
LANGUAGE 'plpgsql'
IMMUTABLE
RETURNS NULL ON NULL INPUT;

It takes one array300 and outputs one array100. To use it:

SELECT id, array300_to_100(array300)
FROM table1;

If you have any problems understanding it - just ask me.

like image 174
Ihor Romanchenko Avatar answered Sep 30 '22 17:09

Ihor Romanchenko


Putting the pieces of Igor into another form:

 select id, array300, (
    select array_agg(z) from
    (
        select avg(x) from 
        (
            select x, ntile(array_length(array300,1)/3) over() from unnest(array300) x
        ) y 
        group by ntile
    ) z
) array100
from your_table

For a small example table like this

 id |       array300        
----+-----------------------
  1 | {110,25,53,110,25,53}
  2 | {56,75,59,110,25,53}
  3 | {65,93,82,110,25,53}
  4 | {75,70,80,110,25,53}

the result is:

 id |       array300        |                   array100                    
----+-----------------------+-----------------------------------------------
  1 | {110,25,53,110,25,53} | {(62.6666666666666667),(62.6666666666666667)}
  2 | {56,75,59,110,25,53}  | {(63.3333333333333333),(62.6666666666666667)}
  3 | {65,93,82,110,25,53}  | {(80.0000000000000000),(62.6666666666666667)}
  4 | {75,70,80,110,25,53}  | {(75.0000000000000000),(62.6666666666666667)}
(4 rows)

Edit My first version used a fixes ntile(2). This only worked for source arrays of size 6. I've fixed that by using array_length(array300,1)/3 instead.

like image 41
A.H. Avatar answered Sep 30 '22 17:09

A.H.