In PostgreSQL 9.3, I am storing some fairly complex JSON objects with arrays nested within arrays. This snippet isn't the real data, but illustrates the same concept:
{
"customerId" : "12345",
"orders" : [{
"orderId" : "54321",
"lineItems" : [{
"productId" : "abc",
"qty" : 3
}, {
"productId" : "def",
"qty" : 1
}]
}
}
I want the ability for SQL queries to operate on lineItem
objects... not just within this single JSON structure, but across all JSON objects in that table column. For instance, a SQL query that returns all distinct productId
's, and their total sold qty
sums. To prevent such a query from taking all day, I would probably want an index on lineItem
or its child fields.
Using this StackOverflow question, I figured out how to write a query that works:
SELECT
line_item->>'productId' AS product_id,
SUM(CAST(line_item->>'qty' AS INTEGER)) AS qty_sold
FROM
my_table,
json_array_elements(my_table.my_json_column->'orders') AS order,
json_array_elements(order->'lineItems') AS line_item
GROUP BY product_id;
However, that original StackOverflow question dealt with data that was only nested one level deep rather than two. I extended the same concept (i.e. a "lateral join" within the FROM
clause) by adding an extra lateral join to dive one level deeper. However, I'm not sure if this is the best approach, so the first part of my question is: What's the best approach for querying JSON data that is an arbitrary number of levels deep in the JSON objects?
For the second part of this, creating an index on such nested data, this StackOverflow question again deals with data nested only one level deep. However, I'm just completely lost, with my head swimming trying to think of how I would apply this to a deeper number of levels. Can anyone please offer a clear approach for indexing data that is at least two levels deep, as with lineItems
above?
To deal with the infinite recursion issue, you need to use a recursive CTE to operate upon each individual json element in each table row:
WITH RECURSIVE
raw_json as (
SELECT
*
FROM
(VALUES
(1,
'{
"customerId": "12345",
"orders": [
{
"orderId": "54321",
"lineItems": [
{
"productId": "abc",
"qty": 3
},
{
"productId": "def",
"qty": 1
}
]
}
]
}'::json),
(2,
'{
"customerId": "678910",
"artibitraryLevel": {
"orders": [
{
"orderId": "55345",
"lineItems": [
{
"productId": "abc",
"qty": 3
},
{
"productId": "ghi",
"qty": 10
}
]
}
]
}
}'::json)
) a(id,sample_json)
),
json_recursive as (
SELECT
a.id,
b.k,
b.v,
b.json_type,
case when b.json_type = 'object' and not (b.v->>'customerId') is null then b.v->>'customerId' else a.customer_id end customer_id, --track any arbitrary id when iterating through json graph
case when b.json_type = 'object' and not (b.v->>'orderId') is null then b.v->>'orderId' else a.order_id end order_id,
case when b.json_type = 'object' and not (b.v->>'productId') is null then b.v->>'productId' else a.product_id end product_id
FROM
(
SELECT
id,
sample_json v,
case left(sample_json::text,1)
when '[' then 'array'
when '{' then 'object'
else 'scalar'
end json_type, --because choice of json accessor function depends on this, and for some reason postgres has no built in function to get this value
sample_json->>'customerId' customer_id,
sample_json->>'orderId' order_id,
sample_json->>'productId' product_id
FROM
raw_json
) a
CROSS JOIN LATERAL (
SELECT
b.k,
b.v,
case left(b.v::text,1)
when '[' then 'array'
when '{' then 'object'
else 'scalar'
end json_type
FROM
json_each(case json_type when 'object' then a.v else null end ) b(k,v) --get key value pairs for individual elements if we are dealing with standard object
UNION ALL
SELECT
null::text k,
c.v,
case left(c.v::text,1)
when '[' then 'array'
when '{' then 'object'
else 'scalar'
end json_type
FROM
json_array_elements(case json_type when 'array' then a.v else null end) c(v) --if we have an array, just get the elements and use parent key
) b
UNION ALL --recursive term
SELECT
a.id,
b.k,
b.v,
b.json_type,
case when b.json_type = 'object' and not (b.v->>'customerId') is null then b.v->>'customerId' else a.customer_id end customer_id,
case when b.json_type = 'object' and not (b.v->>'orderId') is null then b.v->>'orderId' else a.order_id end order_id,
case when b.json_type = 'object' and not (b.v->>'productId') is null then b.v->>'productId' else a.product_id end product_id
FROM
json_recursive a
CROSS JOIN LATERAL (
SELECT
b.k,
b.v,
case left(b.v::text,1)
when '[' then 'array'
when '{' then 'object'
else 'scalar'
end json_type
FROM
json_each(case json_type when 'object' then a.v else null end ) b(k,v)
UNION ALL
SELECT
a.k,
c.v,
case left(c.v::text,1)
when '[' then 'array'
when '{' then 'object'
else 'scalar'
end json_type
FROM
json_array_elements(case json_type when 'array' then a.v else null end) c(v)
) b
)
Then you can either sum "qty" by an arbitrary id...
SELECT
customer_id,
sum(v::text::integer)
FROM
json_recursive
WHERE
k = 'qty'
GROUP BY
customer_id
Or you can get the "lineItem" objects and manipulate them as you desire:
SELECT
*
FROM
json_recursive
WHERE
k = 'lineItems' and json_type = 'object'
As for the indexing, you could adapt the recursive query into a function that returns the unique keys for each json object in each row of your original table and then it to create a functional index on your json column:
SELECT
array_agg(DISTINCT k)
FROM
json_recursive
WHERE
not k is null
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With