Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Combine Arrays in Snowsql GroupBy and Only Keep Distinct Values?

I am trying to run a query that lists all the distinct markets a user has submitted to for my dataset.

The values in the Markets column are already in an array format. When I run the query below, I get an array of arrays and some markets may be listed multiple times because the distinct clause is looking at the unique arrays and not the values in the arrays.

For example if I'm trying to group ['New York'] and ['New York' , 'Chicago'], my goal is to get ['New York', 'Chicago'] as my result, but am currently getting [['New York'],['New York', 'Chicago']]. Appreciate any assistance.

SELECT 
  s.submitter_id,
  ARRAY_AGG(DISTINCT s.markets)
FROM 
  analytics.submissions AS s
GROUP BY 1
like image 436
lucas winter Avatar asked Oct 15 '22 05:10

lucas winter


2 Answers

A simple way is to first flatten the array

WITH data AS (
 SELECT submitter_id, split(markets,';') AS markets 
 FROM VALUES (1,'new york'), (1,'new york;chicargo') s(submitter_id, markets)
)
SELECT 
  a.submitter_id,
  ARRAY_AGG(DISTINCT a.market)
FROM (
    SELECT s.submitter_id
        ,f.value AS market
    FROM data AS s,
    LATERAL FLATTEN(input => s.markets) f
) AS a
GROUP BY 1;
like image 177
Simeon Pilgrim Avatar answered Nov 01 '22 08:11

Simeon Pilgrim


A variation using a javascript UDF:

WITH data AS (
 SELECT submitter_id, split(markets,';') AS markets 
 FROM VALUES (1,'new york'), (1,'new york;chicargo') s(submitter_id, markets)
)
SELECT 
  submitter_id,
  array_flat_distinct(ARRAY_AGG(distinct markets))
from data
group by 1;

Where the UDF is defined as:

create or replace function array_flat_distinct("a" array)
returns array
language javascript
as
$$
    return [...new Set(a.reduce((b,c)=>[...b,...c]))]
$$
;
like image 37
waldente Avatar answered Nov 01 '22 09:11

waldente