Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MODE aggregation function with redshift

I have the following tables:

customers

customer_id name
----------------
1           bob
2           alice
3           tim

purchases

id customer_id item_bought
--------------------------
1  1           hat
2  1           shoes
3  2           glasses
3  2           glasses
4  2           book
5  3           shoes
6  1           hat

And I want to following result:

customer_name item_bought_most_often
------------------------------------
bob           hat
alice         glasses
tim           shoes

I would do this like this (not actually tried, just the idea):

SELECT customer.name as customer_name,
  MODE(item_bought) as item_bought_most_ofen
FROM customers
INNER JOIN purchases USING (customer_id)
GROUP_BY customer_id

However, the MODE aggregation function doesn't exist in Redshift.

It seems that Redshift user defined functions are just regular scalar functions, not aggregations functions. So I don't think I can define it myself.

Any workaround ?

like image 750
Benjamin Crouzier Avatar asked Apr 26 '16 10:04

Benjamin Crouzier


People also ask

Does redshift support window functions?

Amazon Redshift supports two types of window functions: aggregate and ranking. These are the supported aggregate functions: AVG. COUNT.

What is aggregation function with example?

For example, sum() can be computed for a data cube by first partitioning the cube into a set of subcubes, computing sum() for each subcube, and then summing up the counts obtained for each subcube. Hence, sum() is a distributive aggregate function.

What does Listagg do in redshift?

For each group in a query, the LISTAGG aggregate function orders the rows for that group according to the ORDER BY expression, then concatenates the values into a single string. LISTAGG is a compute-node only function.

What is aggregation function how it works?

An aggregate function performs a calculation on a set of values, and returns a single value. Except for COUNT(*) , aggregate functions ignore null values. Aggregate functions are often used with the GROUP BY clause of the SELECT statement. All aggregate functions are deterministic.


1 Answers

You can mimic mode() by using row_number():

select name, item_bought
from (select c.name, p.item_bought, count(*) as cnt,
             row_number() over (partition by c.name order by count(*) desc) as seqnum
      from customers c join
           purchases p
           using (customer_id)
      group by c.name, p.item_bought
     ) cp
where seqnum = 1;
like image 69
Gordon Linoff Avatar answered Oct 24 '22 20:10

Gordon Linoff