I have a query that groups by a two variables to get a total of another. In order to maintain my table structure for later computations I listagg() two other variables to save for the next stage of the query. However, when I attempt to do two later flatten's of the listagg() columns my data is repeated to many times.
Example: my_table
id | list1 | code| list2 | total
--------|-----------------|-----|----------|---
2434166 | 735,768,769,746 | 124 | 21,2,1,6 | 30
select
id,
list1_table.value::int as list1_val,
code,
list2.value::int as list2_val,
total
from my_table
lateral flatten(input=>split(list1, ',')) list1_table,
lateral flatten(input=>split(list2, ',')) list2_table
Result:
id | list1 | code| list2 | total
--------|-----------------|-----|----------|---
2434166 | 768 | 124 | 2 | 30
2434166 | 735 | 124 | 2 | 30
2434166 | 746 | 124 | 2 | 30
2434166 | 769 | 124 | 2 | 30
2434166 | 768 | 124 | 21 | 30
2434166 | 735 | 124 | 21 | 30
2434166 | 746 | 124 | 21 | 30
2434166 | 769 | 124 | 21 | 30
2434166 | 768 | 124 | 6 | 30
2434166 | 735 | 124 | 6 | 30
2434166 | 746 | 124 | 6 | 30
2434166 | 769 | 124 | 6 | 30
2434166 | 768 | 124 | 1 | 30
2434166 | 735 | 124 | 1 | 30
2434166 | 746 | 124 | 1 | 30
2434166 | 769 | 124 | 1 | 30
I understand what is going on but I'm just wonder how do I get my desired result:
id | list1 | code| list2 | total
--------|-----------------|-----|----------|---
2434166 | 768 | 124 | 2 | 30
2434166 | 735 | 124 | 21 | 30
2434166 | 746 | 124 | 6 | 30
2434166 | 769 | 124 | 1 | 30
The Lateral Flatten function is applied to the column that holds the JSON file (need a common in between). Each object within the Name and Value Pair will be moved as independent rows once the array has been flattened. Most of the time, you're only concerned with the output's value, so we may use .
You would call it like this: select * from table(unnest(array_construct(1, 2, 3, 4, 5))); This returns a table with a single column named VALUE of type variant. You can make a version that returns strings, integers, etc.
FLATTEN is a table function that takes a VARIANT, OBJECT, or ARRAY column and produces a lateral view (i.e. an inline view that contains correlation referring to other tables that precede it in the FROM clause). FLATTEN can be used to convert semi-structured data to a relational representation.
In a FROM clause, the LATERAL keyword allows an inline view to reference columns from a table expression that precedes that inline view. A lateral join behaves more like a correlated subquery than like most JOINs.
As you noticed yourself, you want 4 records. There are 2 ways to do it, both exploit the index
column produced by flatten
, which represents the position of the produced value in the input (see the Flatten Documentation)
First way is to take the result of your query, and add these index column, here's an example:
select id,
list1_table.value::int as list1_val, list1_table.index as list1_index, code,
list2_table.value::int as list2_val, list2_table.index as list2_index, total
from my_table,
lateral flatten(input=>split(list1, ',')) list1_table,
lateral flatten(input=>split(list2, ',')) list2_table;
---------+-----------+-------------+------+-----------+-------------+-------+
ID | LIST1_VAL | LIST1_INDEX | CODE | LIST2_VAL | LIST2_INDEX | TOTAL |
---------+-----------+-------------+------+-----------+-------------+-------+
2434166 | 735 | 0 | 124 | 21 | 0 | 30 |
2434166 | 735 | 0 | 124 | 2 | 1 | 30 |
2434166 | 735 | 0 | 124 | 1 | 2 | 30 |
2434166 | 735 | 0 | 124 | 6 | 3 | 30 |
2434166 | 768 | 1 | 124 | 21 | 0 | 30 |
2434166 | 768 | 1 | 124 | 2 | 1 | 30 |
2434166 | 768 | 1 | 124 | 1 | 2 | 30 |
2434166 | 768 | 1 | 124 | 6 | 3 | 30 |
2434166 | 769 | 2 | 124 | 21 | 0 | 30 |
2434166 | 769 | 2 | 124 | 2 | 1 | 30 |
2434166 | 769 | 2 | 124 | 1 | 2 | 30 |
2434166 | 769 | 2 | 124 | 6 | 3 | 30 |
2434166 | 746 | 3 | 124 | 21 | 0 | 30 |
2434166 | 746 | 3 | 124 | 2 | 1 | 30 |
2434166 | 746 | 3 | 124 | 1 | 2 | 30 |
2434166 | 746 | 3 | 124 | 6 | 3 | 30 |
---------+-----------+-------------+------+-----------+-------------+-------+
As you can see, the rows you are interested are the ones with the same index.
So to get your result by selecting these rows after the lateral joins happen:
select id,
list1_table.value::int as list1_val, code,
list2_table.value::int as list2_val, total
from my_table,
lateral flatten(input=>split(list1, ',')) list1_table,
lateral flatten(input=>split(list2, ',')) list2_table
where list1_table.index = list2_table.index;
---------+-----------+------+-----------+-------+
ID | LIST1_VAL | CODE | LIST2_VAL | TOTAL |
---------+-----------+------+-----------+-------+
2434166 | 735 | 124 | 21 | 30 |
2434166 | 768 | 124 | 2 | 30 |
2434166 | 769 | 124 | 1 | 30 |
2434166 | 746 | 124 | 6 | 30 |
---------+-----------+------+-----------+-------+
An easier, more efficient, and more flexible way (useful if you have multiple arrays like that or e.g. array indices are related but not 1-to-1) is to flatten only on one array, and then use the index of the produced elements to lookup values in other arrays.
Here's an example:
select id, list1_table.value::int as list1_val, code,
split(list2,',')[list1_table.index]::int as list2_val, -- array lookup here
total
from my_table, lateral flatten(input=>split(list1, ',')) list1_table;
---------+-----------+------+-----------+-------+
ID | LIST1_VAL | CODE | LIST2_VAL | TOTAL |
---------+-----------+------+-----------+-------+
2434166 | 735 | 124 | 21 | 30 |
2434166 | 768 | 124 | 2 | 30 |
2434166 | 769 | 124 | 1 | 30 |
2434166 | 746 | 124 | 6 | 30 |
---------+-----------+------+-----------+-------+
See how we simply use the index produced when flattening list1
to lookup the value from list2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With