I'm struggling doing aggregations on a JSONB field in a PostgreSQL database. This is probably easier explained with an example so if create and populate a table called analysis
with 2 columns (id
and analysis
) as follows: -
create table analysis (
id serial primary key,
analysis jsonb
);
insert into analysis
(id, analysis) values
(1, '{"category" : "news", "results" : [1, 2, 3, 4, 5 , 6, 7, 8, 9, 10, 11, 12, 13, 14, null, null]}'),
(2, '{"category" : "news", "results" : [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, null, 26]}'),
(3, '{"category" : "news", "results" : [31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46]}'),
(4, '{"category" : "sport", "results" : [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66]}'),
(5, '{"category" : "sport", "results" : [71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]}'),
(6, '{"category" : "weather", "results" : [91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106]}');
As you can see the analysis
JSONB field always contains 2 attributes category
and results
. The results attribute will always contain an fixed length array of size 16. I've used various functions such as jsonb_array_elements
but what I'm trying to do is the following: -
When I want is a statement to return 3 rows grouped by category (i.e. news
, sport
and weather
) and a 16 fixed length array containing averages. To further complicate things, if there are null
s in the array then we should ignore them (i.e. we are not simply summing and averaging by the number of rows). The result should look something like the following: -
category | analysis_average
-----------+--------------------------------------------------------------------------------------------------------------
"news" | [14.33, 15.33, 16.33, 17.33, 18.33, 19.33, 20.33, 21.33, 22.33, 23.33, 24.33, 25.33, 26.33, 27.33, 45, 36]
"sport" | [61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76]
"weather" | [91, 92, 93, 94, 95, 96, 97, 98, 99, 00, 101, 102, 103, 104, 105, 106]
NOTE: Notice the 45
and 36
in the last 2 array itmes on the 1st row which illustrates ignoring the nulls
s.
I had considered creating a view which exploded the array into 16 columns i.e.
create view analysis_view as
select a.*,
(a.analysis->'results'->>0)::int as result0,
(a.analysis->'results'->>1)::int as result1
/* ... etc for all 16 array entries .. */
from analysis a;
This seems extremely inelegant to me and removes the advantages of using an array in the first place but could probably hack something together using that approach.
Any pointers or tips will be most appreciated!
Also performance is really important here so the higher the performance the better!
JSONB and IndexesPostgreSQL can use indexes for the text results as compare operands. GIN index can be used by the GIN JSONB operator class.
Querying the JSON documentPostgreSQL has two native operators -> and ->> to query JSON documents. The first operator -> returns a JSON object, while the operator ->> returns text. These operators work on both JSON as well as JSONB columns. There are additional operators available for JSONB columns.
(Note: It is possible to make a jsonb[] column, but we don't recommend it, as there's no value over a jsonb column that contains an array.)
Most applications should use JSONB for schemaless data. It stores parsed JSON in a binary format, so queries are efficient.
This will work for any array length
select category, array_agg(average order by subscript) as average
from (
select
a.analysis->>'category' category,
subscript,
avg(v)::numeric(5,2) as average
from
analysis a,
lateral unnest(
array(select jsonb_array_elements_text(analysis->'results')::int)
) with ordinality s(v,subscript)
group by 1, 2
) s
group by category
;
category | average
----------+----------------------------------------------------------------------------------------------------------
news | {14.33,15.33,16.33,17.33,18.33,19.33,20.33,21.33,22.33,23.33,24.33,25.33,26.33,27.33,45.00,36.00}
sport | {61.00,62.00,63.00,64.00,65.00,66.00,67.00,68.00,69.00,70.00,71.00,72.00,73.00,74.00,75.00,76.00}
weather | {91.00,92.00,93.00,94.00,95.00,96.00,97.00,98.00,99.00,100.00,101.00,102.00,103.00,104.00,105.00,106.00}
table functions - with ordinality
lateral
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With