Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get distinct values on GROUP_CONCAT using Google Big Query

I'm trying to get distinct values when using GROUP_CONCAT in BigQuery.

I'll recreate the situation using a simpler, static example:

EDIT: I've modified the example to represent better my real situation: 2 columns with group_concat which needs to be distinct:

SELECT 
  category, 
  GROUP_CONCAT(id) as ids, 
  GROUP_CONCAT(product) as products
FROM 
 (SELECT "a" as category, "1" as id, "car" as product),
 (SELECT "a" as category, "2" as id, "car" as product),
 (SELECT "a" as category, "3" as id, "car" as product),
 (SELECT "b" as category, "4" as id, "car" as product),
 (SELECT "b" as category, "5" as id, "car" as product),
 (SELECT "b" as category, "2" as id, "bike" as product),
 (SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY 
  category

This example returns:

Row category    ids products
1   a   1,2,3,1 car,car,car,truck
2   b   4,5,6   car,car,bike

I'd like to strip the duplicated values found, to return like:

Row category    ids products 
1   a   1,2,3   car,truck
2   b   4,5,6   car,bike

In MySQL, GROUP_CONCAT has a DISTINCT OPTION, but in BigQuery there isn't.

Any ideas?

like image 744
Leonardo Naressi Avatar asked Feb 20 '15 21:02

Leonardo Naressi


2 Answers

Here is solution which uses UNIQUE scope aggregation function to remove duplicates. Note, that in order to use it, first we need to build a REPEATED using NEST aggregation:

SELECT 
  GROUP_CONCAT(UNIQUE(ids)) WITHIN RECORD,
  GROUP_CONCAT(UNIQUE(products)) WITHIN RECORD 
FROM (
SELECT 
  category, 
  NEST(id) as ids, 
  NEST(product) as products
FROM 
 (SELECT "a" as category, "1" as id, "car" as product),
 (SELECT "a" as category, "2" as id, "car" as product),
 (SELECT "a" as category, "3" as id, "car" as product),
 (SELECT "b" as category, "4" as id, "car" as product),
 (SELECT "b" as category, "5" as id, "car" as product),
 (SELECT "b" as category, "2" as id, "bike" as product),
 (SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY 
  category
)
like image 129
Mosha Pasumansky Avatar answered Sep 17 '22 17:09

Mosha Pasumansky


Removing duplicates before applying group_concat will achieve the result you want:

    SELECT 
      category, 
      GROUP_CONCAT(id) as ids
    FROM (  
    SELECT category, id
    FROM 
     (SELECT "a" as category, "1" as id),
     (SELECT "a" as category, "2" as id),
     (SELECT "a" as category, "3" as id),
     (SELECT "b" as category, "4" as id),
     (SELECT "b" as category, "5" as id),
     (SELECT "b" as category, "6" as id),
     (SELECT "a" as category, "1" as id),
    GROUP BY 
      category, id
    )
    GROUP BY 
      category
like image 23
Ahmed Ayad Avatar answered Sep 21 '22 17:09

Ahmed Ayad