Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select both row_number and count over partition?

I need to find duplicate record (with master record id and duplicate record ids):

select ciid, name from (
select ciid, name, row_number() over (
  partition by related_id, name order by updatedate desc) rn
) where rn = 1;

This gives me the master record IDs, but it also includes records without duplicates.

If I use

select ciid, name from (
select ciid, name, row_number() over (
  partition by related_id, name order by updatedate desc) rn
) where rn > 1;

This gets me all the duplicate records, but not the master record.

I was wishing if I do something like:

select ciid, name from (
select ciid, name, row_number()  over (
    partition by related_id, name order by updatedate desc
  ) rn, count(*)  over (
    partition by related_id, name order by updatedate desc
  ) cnt
) where rn = 1 and cnt > 1;

But I was worried about the performance, or even is it actually doing what I want.

How do I get the master record only for the ones with duplicates? Please note that name is not unique column. Only ciid is unique.

like image 564
SwiftMango Avatar asked Aug 26 '15 16:08

SwiftMango


People also ask

Can we use count with partition by?

SQL Count with Partition By clause is one of the new powerful syntax that t-sql developers can easily use. For example, while selecting a list of rows you can also calculate count of rows sharing the same field values without using subselects or SQL CTE (Common Table Expressions) in your query.

What does ROW_NUMBER () over partition by do?

PARTITION BY It is an optional clause in the ROW_NUMBER function. It is a clause that divides the result set into partitions (groups of rows). The ROW_NUMBER() method is then applied to each partition, which assigns a separate rank number to each partition.

Can you use ROW_NUMBER in WHERE clause?

The ROW_NUMBER function cannot currently be used in a WHERE clause. Derby does not currently support ORDER BY in subqueries, so there is currently no way to guarantee the order of rows in the SELECT subquery.

Can we use ROW_NUMBER without over?

The row_number() window function can be used without order by in over to arbitrarily assign a unique value to each row.


1 Answers

I ended up using similar query in my question:

select ciid, name from (
select ciid, name, row_number()  over (
    partition by related_id, name order by updatedate desc
  ) rn, count(*)  over (
    partition by related_id, name desc
  ) cnt
) where rn = 1 and cnt > 1;

Works surprisingly well. The master record is where rn = 1 and duplicates are where rn > 1. Make sure count(*) over (partition ..) cannot have order by clause.

like image 159
SwiftMango Avatar answered Oct 02 '22 13:10

SwiftMango