I have a query that produces strings of arrays using they array_agg()
function
SELECT
array_agg(message) as sequence
from mytable
group by id
which produces a table that looks like this:
sequence
1 foo foo bar baz bar baz
2 foo bar bar bar baz
3 foo foo foo bar bar baz
but I aim to condense the array of strings so that none can repeat more than once in a row, for example, the desired output would look like this:
sequence
1 foo bar baz bar baz
2 foo bar baz
3 foo bar baz
How would one go about doing this with Presto SQL ?
unnest is normally used with a join and will expand the array into relation (i.e. for every element of array an row will be introduced).
The || operator performs concatenation.
This is the function to use if you want to concatenate all the values in an array field into one string value. You can specify an optional argument as a separator, and it can be any string. If you do not specify a separator, there will be nothing aded between the values.
The ARRAY_AGG aggregator creates a new SQL. ARRAY value per group that will contain the values of group as its items. ARRAY_AGG is not preserving order of values inside a group. If an array needs to be ordered, a LINQ OrderBy can be used. ARRAY_AGG and EXPLODE are conceptually inverse operations.
You can do this in one of two ways:
array_distinct
function:WITH mytable(id, message) AS (VALUES
(1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'bar'), (1, 'baz'), (1, 'baz'),
(2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'),
(3, 'foo'), (3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_distinct(array_agg(message)) AS sequence
FROM mytable
GROUP BY id
DISTINCT
qualifier in the aggregation to remove the duplicate values before they are passed into array_agg.WITH mytable(id, message) AS (VALUES
(1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'bar'), (1, 'baz'), (1, 'baz'),
(2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'), (3, 'foo'),
(3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_agg(DISTINCT message) AS sequence
FROM mytable
GROUP BY id
Both alternatives produce the same result:
sequence
-----------------
[foo, bar, baz]
[foo, bar, baz]
[foo, bar, baz]
(3 rows)
UPDATE: You can remove repeated sequences of elements with the recently introduced MATCH_RECOGNIZE
feature:
WITH mytable(id, message) AS (VALUES
(1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'baz'), (1, 'bar'), (1, 'baz'),
(2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'),
(3, 'foo'), (3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_agg(value) AS sequence
FROM mytable
MATCH_RECOGNIZE(
PARTITION BY id
MEASURES A.message AS value
PATTERN (A B*)
DEFINE B AS message = PREV(message)
)
GROUP BY id
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With