Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server DISTINCT pagination with ROW_NUMBER() not distinct

Ok, so basically my DISTINCT is rendered useless by using ROW_NUMBER(), I need to avoid this as it's causing duplicate results (apart from a unique num of course!)

So what I'm looking for is a query that'll work the same but not have duplicate rows because num is unique:

WITH t AS
(
   SELECT DISTINCT *, ROW_NUMBER() OVER (ORDER BY Date) AS num 
   FROM Original_Import 
   LEFT JOIN eqcas.dbo.BASE_PROXY_VIEW_WITHTARGET ON ADName = Targetuser
   WHERE (BaseProxy = 'agmc' OR ADName = 'agmc')
      AND (Commited IS NULL OR Commited = 0)
)
SELECT DISTINCT ID, num, ADName, Description_User, Description_Amex, Amount, Date
FROM t 
WHERE (t.BaseProxy = 'agmc' OR t.ADName = 'agmc') 
   AND num BETWEEN 0 AND 20
   AND (Commited IS NULL OR Commited = 0)
ORDER BY Date

It's probably fairly trivial to fix, but seeing as I'm not a SQL Server guy I'm not used to these inner queries etc..

Update: yes, num is used for paging.

like image 540
John Hunt Avatar asked Nov 17 '11 00:11

John Hunt


2 Answers

Seems like I'm two years late with my recent blog post about ROW_NUMBER() being to SELECT what DENSE_RANK() is to SELECT DISTINCT. Your CTE would have to be replaced by this:

WITH t AS
(
   SELECT DISTINCT *, DENSE_RANK() OVER (ORDER BY Date, ...) AS num 
   FROM Original_Import 
   LEFT JOIN eqcas.dbo.BASE_PROXY_VIEW_WITHTARGET ON ADName = Targetuser
   WHERE (BaseProxy = 'agmc' OR ADName = 'agmc')
      AND (Commited IS NULL OR Commited = 0)
)
SELECT ...

In the above query, the DENSE_RANK()'s ORDER BY clause will need to list all columns from Original_Import and from BASE_PROXY_VIEW_WITH_TARGET, to reproduce the same ordering as the DISTINCT keyword. That will assign exactly one rank per duplicate record set, such that DISTINCT will work again.

In the referenced blog post, I have also included a link to a SQLFiddle illustrating this in a more trivial example.

SELECT DISTINCT
  v, 
  DENSE_RANK() OVER (w) row_number
FROM t
WINDOW w AS (ORDER BY v)
ORDER BY v, row_number

An explanation why DISTINCT removes duplicate rows after window functions having been calculated can be found in this post here.

like image 143
Lukas Eder Avatar answered Nov 11 '22 10:11

Lukas Eder


limited confidence on this as I can't test or even compile but this is the general gist of what I was thinking...

WITH t AS
(
   SELECT [insert your fields here], ROW_NUMBER() OVER (ORDER BY Date) AS num 
   FROM (
     SELECT DISTINCT *[insert your fields here]
     FROM Original_Import 
     LEFT JOIN eqcas.dbo.BASE_PROXY_VIEW_WITHTARGET ON ADName = Targetuser
     WHERE (BaseProxy = 'agmc' OR ADName = 'agmc')
      AND (Commited IS NULL OR Commited = 0)) as X
)
SELECT DISTINCT ID, num, ADName, Description_User, Description_Amex, Amount, Date
FROM t 
WHERE (t.BaseProxy = 'agmc' OR t.ADName = 'agmc') 
   AND num BETWEEN 0 AND 20
   AND (Commited IS NULL OR Commited = 0)
ORDER BY Date
like image 24
xQbert Avatar answered Nov 11 '22 10:11

xQbert