Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count distinct values with OVER(PARTITION BY id)

Tags:

Is it possible to count distinct values in conjunction with window functions like OVER(PARTITION BY id)? Currently my query is as follows:

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,    congestion.id_element, ROW_NUMBER() OVER(     PARTITION BY congestion.id_element     ORDER BY congestion.date), COUNT(DISTINCT congestion.week_nb) OVER(     PARTITION BY congestion.id_element ) AS week_count FROM congestion WHERE congestion.date >= '2014.01.01' AND congestion.date <= '2014.12.31' ORDER BY id_element, date 

However, when I try to execute the query I get the following error:

"COUNT(DISTINCT": "DISTINCT is not implemented for window functions" 
like image 718
user007 Avatar asked Feb 12 '14 13:02

user007


People also ask

Can we use distinct in partition by?

DISTINCT does not appear to be possible within the partition functions.

Can we use distinct in over clause?

Msg 10759, Level 15, State 1, Line 1 Use of DISTINCT is not allowed with the OVER clause.

Can Count distinct used in window function?

Note: COUNT(DISTINCT) window functions cannot have an ORDER BY clause in the window definition or a non-default window frame. The following example uses COUNT(DISTINCT expression) . The query counts the number of distinct full-time scores that start with 7 for rows partitioned by half-time scores.

Why is Count distinct so slow?

It's slow because the database is iterating over all the logs and all the dashboards, then joining them, then sorting them, all before getting down to real work of grouping and aggregating.


1 Answers

No, as the error message states, DISTINCT is not implemented with windows functions. Aplying info from this link into your case you could use something like:

WITH uniques AS (  SELECT congestion.id_element, COUNT(DISTINCT congestion.week_nb) AS unique_references  FROM congestion WHERE congestion.date >= '2014.01.01' AND congestion.date <= '2014.12.31'  GROUP BY congestion.id_element )  SELECT congestion.date, congestion.week_nb, congestion.id_congestion,    congestion.id_element, ROW_NUMBER() OVER(     PARTITION BY congestion.id_element     ORDER BY congestion.date), uniques.unique_references AS week_count FROM congestion JOIN uniques USING (id_element) WHERE congestion.date >= '2014.01.01' AND congestion.date <= '2014.12.31' ORDER BY id_element, date 

Depending on the situation you could also put a subquery straight into SELECT-list:

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,    congestion.id_element, ROW_NUMBER() OVER(     PARTITION BY congestion.id_element     ORDER BY congestion.date), (SELECT COUNT(DISTINCT dist_con.week_nb)     FROM congestion AS dist_con     WHERE dist_con.date >= '2014.01.01'     AND dist_con.date <= '2014.12.31'     AND dist_con.id_element = congestion.id_element) AS week_count FROM congestion WHERE congestion.date >= '2014.01.01' AND congestion.date <= '2014.12.31' ORDER BY id_element, date 
like image 115
Simo Kivistö Avatar answered Oct 24 '22 19:10

Simo Kivistö