I have a table like this:
+-----+----------------+ | ID | array300 | +-----+----------------+ | 100 | {110,25,53,..} | | 101 | {56,75,59,...} | | 102 | {65,93,82,...} | | 103 | {75,70,80,...} | +-----+----------------+
array300 column is an array of 300 elements. I need to have arrays of 100 elements with every element representing the average of 3 elements of array300. For this example the answer will be like:
array100
{62.66,...}
{63.33,...}
{80,...}
{78.33,...}
The ARRAY_AGG() accepts an expression that returns a value of any type which is valid for an array element. The ORDER BY clause is an optional clause. It specifies the order of rows processed in the aggregation, which determines the order of the elements in the result array.
An aggregate function performs a calculation on a set of values, and returns a single value. Except for COUNT(*) , aggregate functions ignore null values. Aggregate functions are often used with the GROUP BY clause of the SELECT statement.
There are five aggregate functions, which are: MIN, MAX, COUNT, SUM, and AVG.
The ARRAY_AGG function aggregates a set of elements into an array. Invocation of the ARRAY_AGG aggregate function is based on the result array type.
Try something like this:
SELECT id, unnest(array300) as val, ntile(100) OVER (PARTITION BY id) as bucket_num
FROM your_table
This SELECT
will give you 300 records per array300
with same id
and assing them the bucket_num
(1 for firs 3 elements, 2 for next 3, and so on).
Then use this select to get the avg
of elements in the bucket:
SELECT id, avg(val) as avg_val
FROM (...previous select here...)
GROUP BY id, bucket_num
Next - just aggregate the avg_val
into array:
SELECT id, array_agg(avg_val) as array100
FROM (...previous select here...)
GROUP BY id
Details: unnest , ntile , array_agg , OVER (PARTITION BY )
UPD: Try this function:
CREATE OR REPLACE FUNCTION public.array300_to_100 (
p_array300 numeric []
)
RETURNS numeric [] AS
$body$
DECLARE
dim_start int = array_length(p_array300, 1); --size of input array
dim_end int = 100; -- size of output array
dim_step int = dim_start / dim_end; --avg batch size
tmp_sum NUMERIC; --sum of the batch
result_array NUMERIC[100]; -- resulting array
BEGIN
FOR i IN 1..dim_end LOOP --from 1 to 100.
tmp_sum = 0;
FOR j IN (1+(i-1)*dim_step)..i*dim_step LOOP --from 1 to 3, 4 to 6, ...
tmp_sum = tmp_sum + p_array300[j];
END LOOP;
result_array[i] = tmp_sum / dim_step;
END LOOP;
RETURN result_array;
END;
$body$
LANGUAGE 'plpgsql'
IMMUTABLE
RETURNS NULL ON NULL INPUT;
It takes one array300
and outputs one array100
. To use it:
SELECT id, array300_to_100(array300)
FROM table1;
If you have any problems understanding it - just ask me.
Putting the pieces of Igor into another form:
select id, array300, (
select array_agg(z) from
(
select avg(x) from
(
select x, ntile(array_length(array300,1)/3) over() from unnest(array300) x
) y
group by ntile
) z
) array100
from your_table
For a small example table like this
id | array300
----+-----------------------
1 | {110,25,53,110,25,53}
2 | {56,75,59,110,25,53}
3 | {65,93,82,110,25,53}
4 | {75,70,80,110,25,53}
the result is:
id | array300 | array100
----+-----------------------+-----------------------------------------------
1 | {110,25,53,110,25,53} | {(62.6666666666666667),(62.6666666666666667)}
2 | {56,75,59,110,25,53} | {(63.3333333333333333),(62.6666666666666667)}
3 | {65,93,82,110,25,53} | {(80.0000000000000000),(62.6666666666666667)}
4 | {75,70,80,110,25,53} | {(75.0000000000000000),(62.6666666666666667)}
(4 rows)
Edit My first version used a fixes ntile(2)
. This only worked for source arrays of size 6. I've fixed that by using array_length(array300,1)/3
instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With