How to speed up group-based duplication-count queries on unindexed tables

Question

When I need to know the number of rows containing more than n duplicates for certain colulmn c, I can do it like this:

WITH duplicateRows AS (
    SELECT COUNT(1)
    FROM [table]
    GROUP BY c
    HAVING COUNT(1) > n
) SELECT COUNT(1) FROM duplicateRows

This leads to an unwanted behaviour: SQL Server counts all rows grouped by i, which (when no index is on this table) leads to horrible performance.

However, when altering the script such that SQL Server doesn't have to count all the rows doesn't solve the problem:

WITH duplicateRows AS (
    SELECT 1
    FROM [table]
    GROUP BY c
    HAVING COUNT(1) > n
) SELECT COUNT(1) FROM duplicateRows

Although SQL Server now in theory can stop counting after n + 1, it leads to the same query plan and query cost.

Of course, the reason is that the GROUP BY really introduces the cost, not the counting. But I'm not at all interested in the numbers. Is there another option to speed up the counting of duplicate rows, on a table without indexes?

MatBailie · Accepted Answer

The greatest two costs in your query are the re-ordering for the GROUP BY (due to lack of appropriate index) and the fact that you're scanning the whole table.

Unfortunately, to identify duplicates, re-ordering the whole table is the cheapest option.

You may get a benefit from the following change, but I highly doubt it would be significant, as I'd expect the execution plan to involve a sort again anyway.

WITH
  sequenced_data AS
(
  SELECT
    ROW_NUMBER() OVER (PARTITION BY fieldC) AS sequence_id
  FROM
    yourTable
)
SELECT
  COUNT(*)
FROM
  sequenced_data
WHERE
  sequence_id = (n+1)

Assumes SQLServer2005+

How to speed up group-based duplication-count queries on unindexed tables

Tags:

sql

sql-server

vstrien

1 Answers

MatBailie

Recent Activity

Donate For Us

How to speed up group-based duplication-count queries on unindexed tables

Tags:

sql

sql-server

vstrien

1 Answers

MatBailie

Related questions

Recent Activity

Donate For Us