Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way of getting @@rowcount from a query using row_number

I have an expensive query using the row_number over() functionality in SQL Server 2005. I return only a sub list of those records as the query is paginated. However, I would like to also return the total number of records, not just the paginated subset. Running the query effectively twice to get the count is out of the question.

Selecting count(*) is also out of the question as the performance is absolutely terrible when I've tried this.

What I'd really love is @@ROW_NUMBERROWCOUNT :-)

like image 374
Phil Bennett Avatar asked Jun 24 '09 13:06

Phil Bennett


People also ask

What is SELECT @@ rowcount?

Usage. SQL Server @@ROWCOUNT is a system variable that is used to return the number of rows that are affected by the last executed statement in the batch.

Is distinct faster than ROW_NUMBER?

In my experience, an aggregate (DISTINCT or GROUP BY) can be quicker then a ROW_NUMBER() approach. Saying that, ROW_NUMBER is better with SQL Server 2008 than SQL Server 2005. However, you'll have to try for your situation.

How do I display row numbers with records in SQL?

If you'd like to number each row in a result set, SQL provides the ROW_NUMBER() function. This function is used in a SELECT clause with other columns. After the ROW_NUMBER() clause, we call the OVER() function. If you pass in any arguments to OVER , the numbering of rows will not be sorted according to any column.

What is ROW_NUMBER () function in SQL?

ROW_NUMBER function is a SQL ranking function that assigns a sequential rank number to each new record in a partition. When the SQL Server ROW NUMBER function detects two identical values in the same partition, it assigns different rank numbers to both.


1 Answers

Over the years a pile of developer sweat has gone into efficiently paging result sets. Yet, there is no one answer--it depends on your use case. Part of the use case is getting your page efficiently, part is figuring out how many rows are in a complete result set. So sorry if i stray a little into paging, but the two are pretty tightly coupled in my mind.

There are a lot of strategies, most of which are bad if you have any sort of data volume & don't fit the use case. While this isn't a complete list, following are some of the options.....

Run Separate Count(*)

  • run a separate query that does a simple "select count(*) from MyTable"
  • simple and easy for a small table
  • good on an unfiltered large table that is either narrow or has a compact non-clustered index you can use
  • breaks down when you have a complicated WHERE/JOIN criteria because running the WHERE/JOIN twice is expensive.
  • breaks down on a wide index because the number of reads goes up.

Combine ROW_Number() OVER() and COUNT(1) OVER(PARTITION By 1)

  • This was suggested by @RBarryYoung. It has the benefit of being simple to implement and very flexible.
  • The down side is that there are a lot of reasons this can become extremely expensive quickly.
  • For example, in a DB i'm currently working there is a Media table with about 6000 rows. It's not particularly wide, has a integer clustered PK and, as well as a compact unique index. Yet, a simple COUNT(*) OVER(PARTITION BY 1) as TotalRows results in ~12,000 reads. Compare that to a simple SELECT COUNT(*) FROM Media -- 12 reads. Wowzers.

UPDATE -- the reads issue I mentioned is a bit of red-herring. It turns out, that with windowed functions the unit used to measure reads is kind of mixed. The net result is what appears to be massive numbers of reads. You can see more on the issue here : Why are logical reads for windowed aggregate functions so high?

Temp Tables / Table Variables

  • There are lots of strategies that take a result set and insert relevant keys or segments of results into temp tables / table variables.
  • For small/medium sized result sets this can provide great results.
  • This type of strategy works across almost any platform/version of SQL.
  • Operating on a result set multiple times (quite often a requirement) is also easy.
  • The down side is when working with large results sets ... inserting a few million rows into a temp table has a cost.
  • Compounding the issue, in a high volume system pressure on TempDB can be quite a factor, and temp tables are effectively working in TempDB.

Gaussian Sum / Double Row Number

  • This idea relies on subset of something the mathematician Gauss figured out (how to sum a series of numbers). The subset is how to get row count from any point in the table.
  • From a series of numbers (Row_Number()) the row count for 1 to N is (N + 1) - 1. More explanation in the links.
  • The formula seems like it would net out to just N, but the if you stick with the formula an interesting things happens, you can figure out row count from a page in the middle of the table.
  • The net result is you do ROW_Number() OVER(Order by ID) and ROW_Number() OVER(Order by ID DESC) then sum the two numbers and subtract 1.
  • Using my Media table as an example my reads dropped from 12,000 to about 75.
  • In a larger page you've ended up repeating data many many times, but the offset in reads may be worth it.
  • I haven't tested this on too many scenarios, so it may fall apart in other scenarios.

Top (@n) / SET ROWCOUNT

  • These aren't specific strategies per-se, but are optimizations based on what we know about the query optimizer.
  • Creatively using Top(@n) [top can be a variable in SQL 2008] or SET ROWCOUNT can reduce your working set ...even if you're pulling a middle page of a result set you can still narrow the result
  • These ideas work because of query optimizer behavior ...a service pack/hotfix can change the behavior (although probably not).
  • In certian instances SET ROWCOUNT can be a bit in accurate
  • This strategy doesn't account for getting the full row count, just makes paging more efficient

So what's a developer to do?

Read my good man, read. Here are some articles that I've leaned on...

  • A More Efficient Method for Paging Through Large Result Sets
  • Optimising Server-Side Paging - Part I
  • Optimising Server-Side Paging - Part II
  • Explaination of the Gaussian Sum
  • Returning Ranked Results with Microsoft SQL Server 2005
  • ROW_NUMBER() OVER Not Fast Enough With Large Result Set
  • Retrieving the First N Records from a SQL Query
  • Server Side Paging using SQL Server 2005
  • Why are logical reads for windowed aggregate functions so high?

Hope that helps.

like image 191
EBarr Avatar answered Sep 23 '22 10:09

EBarr