Condensing arrays in Presto

Tags:

sql

presto

I have a query that produces strings of arrays using they array_agg() function

SELECT 
array_agg(message) as sequence
from mytable
group by id

which produces a table that looks like this:

                 sequence
1 foo foo bar baz bar baz
2     foo bar bar bar baz
3 foo foo foo bar bar baz

but I aim to condense the array of strings so that none can repeat more than once in a row, for example, the desired output would look like this:

    sequence
1 foo bar baz bar baz
2 foo bar baz
3 foo bar baz

How would one go about doing this with Presto SQL ?

273

asked May 28 '19 20:05

the_darkside

1 Answers

You can do this in one of two ways:

Remove duplicates from the resulting arrays using the array_distinct function:

WITH mytable(id, message) AS (VALUES
  (1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'bar'), (1, 'baz'), (1, 'baz'),
  (2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'),
  (3, 'foo'), (3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_distinct(array_agg(message)) AS sequence
FROM mytable
GROUP BY id

Use the DISTINCT qualifier in the aggregation to remove the duplicate values before they are passed into array_agg.

WITH mytable(id, message) AS (VALUES
  (1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'bar'), (1, 'baz'), (1, 'baz'),
  (2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'), (3, 'foo'),
  (3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_agg(DISTINCT message) AS sequence
FROM mytable
GROUP BY id

Both alternatives produce the same result:

    sequence
-----------------
 [foo, bar, baz]
 [foo, bar, baz]
 [foo, bar, baz]
(3 rows)

UPDATE: You can remove repeated sequences of elements with the recently introduced MATCH_RECOGNIZE feature:

WITH mytable(id, message) AS (VALUES
  (1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'baz'), (1, 'bar'), (1, 'baz'),
  (2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'),
  (3, 'foo'), (3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_agg(value) AS sequence
FROM mytable
 MATCH_RECOGNIZE(
    PARTITION BY id
    MEASURES A.message AS value
    PATTERN (A B*)
    DEFINE B AS message = PREV(message)
)
GROUP BY id

answered Sep 27 '22 22:09

Martin Traverso

Related questions
                            
                                How to store key value pairs in MySQL?
                            
                                Substituting value in empty field after using split_part
                            
                                Where is the postgres sql 'cast a tuple' idiom documented?
                            
                                Postgresql: Violates check constraint. Failing row contains
                            
                                Why do you need to include a field in GROUP BY when using OVER (PARTITION BY x)?
                            
                                Conditional JOIN based on column value
                            
                                Why is the Max function used when we pivot text columns in SQL Server?
                            
                                Wordpress SQL: get post category and tags
                            
                                Updating a column from a varchar to jsonb
                            
                                sp_OAGetProperty returning NULL with OUT variable declared as MAX
                            
                                Where can I find usage statistics in Redshift?
                            
                                SQL equivalent for Pandas's [df.groupby(...)['col_name'].shift(1)]
                            
                                MySQL select from INT column
                            
                                Fill in gaps in data, using a value proportional to the gap distance to data from the surrounding rows?
                            
                                Abbreviation of Strings that Remains Unique
                            
                                Select rows until running sum reaches specific value
                            
                                add a column from a select query with index
                            
                                How to join comma separated column values with another table as rows
                            
                                Get date range gaps from a date set
                            
                                Oracle equivalent of information_schema.tables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With