Here is the situation: <ol> <li> I first need to run a query to know how many records exist. For example: <code>SELECT COUNT(DISTINCT userid) from users;</code> </li> <li> Often this will be all that's needed. However, sometimes (say 30% of the time) following the first query, the user will want to run a second query, detailing the records. For example: <code>SELECT * FROM users;</code> </li> </ol> Is there any reason to run <code>SELECT COUNT</code> initially instead of just <code>SELECT</code>? That is, is making the count of records in SQL faster than actually pulling the records back? Or is it doing essentially the same work either way and so I should avoid doing two queries? In other words, is it better to just always pull the records in the first query (not use <code>COUNT</code>), then count the records in code (Java). If the user wants to run the second query, then great, I already have the data. If not, then just dump it. What's the best practice here?

If you know you need the data, go ahead and pull it and count it in code. However, if you only need the count, it is significantly faster to pull the count from the database than it is to actually retrieve rows. Also it is standard practice to only pull what you need. For instance, if you are counting all the rows in a table, most database implementations do not need to look at any rows. Tables know how many rows they have. If the query has filters in the <code>where</code> clause and it can use an index, it again will not need to look at the actual rows' data, just counts the rows from the index. And all this is not counting the less data transferred. A rule of thumb about database speeds is go ahead and try it for yourself. General rules are not always a good indicator. For instance, if the table was 10 rows and only a few columns, I might just pull the whole thing anyway on the off chance I needed it, since 2 round trips to the database would outweigh the cost of the query.

Two things should be considered <h3>QUERY #1</h3> <pre class="prettyprint"><code>SELECT COUNT(DISTINCT userid) from users; </code></pre> This query will go a whole lot faster with an index on <code>userid</code>; If you do not have an index on <code>userid</code> and none of the indexes you already have begin with <code>userid</code>, then run this: <pre class="prettyprint"><code>ALTER TABLE user ADD INDEX (userid); </code></pre> This will make the Query Optimizer choose to look through the index rather than touch the table. <h3>QUERY #2</h3> <pre class="prettyprint"><code>SELECT * from users; </code></pre> Why bother to fetch every column in each row just to count the row? You can replace that with <pre class="prettyprint"><code>SELECT COUNT(id) FROM users; </code></pre> where id is the PRIMARY KEY or <pre class="prettyprint"><code>SELECT COUNT(1) FROM users; </code></pre> You will have to benchmark which query is faster, <code>SELECT COUNT(id)</code> or <code>SELECT COUNT(1)</code> <h3>EPILOGUE</h3> Unless you actually need the data while counting, let the counting happen in the server.

Is COUNT faster than pulling the records and counting in code?

Tags:

Here is the situation:

I first need to run a query to know how many records exist.

For example: SELECT COUNT(DISTINCT userid) from users;
Often this will be all that's needed. However, sometimes (say 30% of the time) following the first query, the user will want to run a second query, detailing the records.

For example: SELECT * FROM users;

Is there any reason to run SELECT COUNT initially instead of just SELECT? That is, is making the count of records in SQL faster than actually pulling the records back? Or is it doing essentially the same work either way and so I should avoid doing two queries?

In other words, is it better to just always pull the records in the first query (not use COUNT), then count the records in code (Java). If the user wants to run the second query, then great, I already have the data. If not, then just dump it.

What's the best practice here?

363

asked Apr 09 '13 21:04

martinez314

2 Answers

If you know you need the data, go ahead and pull it and count it in code. However, if you only need the count, it is significantly faster to pull the count from the database than it is to actually retrieve rows. Also it is standard practice to only pull what you need.

For instance, if you are counting all the rows in a table, most database implementations do not need to look at any rows. Tables know how many rows they have. If the query has filters in the where clause and it can use an index, it again will not need to look at the actual rows' data, just counts the rows from the index.

And all this is not counting the less data transferred.

A rule of thumb about database speeds is go ahead and try it for yourself. General rules are not always a good indicator. For instance, if the table was 10 rows and only a few columns, I might just pull the whole thing anyway on the off chance I needed it, since 2 round trips to the database would outweigh the cost of the query.

156

answered Oct 12 '22 01:10

cmd

Two things should be considered

QUERY #1

SELECT COUNT(DISTINCT userid) from users;

This query will go a whole lot faster with an index on userid; If you do not have an index on userid and none of the indexes you already have begin with userid, then run this:

ALTER TABLE user ADD INDEX (userid);

This will make the Query Optimizer choose to look through the index rather than touch the table.

QUERY #2

SELECT * from users;

Why bother to fetch every column in each row just to count the row?

You can replace that with

SELECT COUNT(id) FROM users;

where id is the PRIMARY KEY or

SELECT COUNT(1) FROM users;

You will have to benchmark which query is faster, SELECT COUNT(id) or SELECT COUNT(1)

EPILOGUE

Unless you actually need the data while counting, let the counting happen in the server.

answered Oct 12 '22 00:10

RolandoMySQLDBA

Related questions
                            
                                How to add bordered triangle over a div tag
                            
                                How to create System Apps in android
                            
                                The best way to check if Tab with exact ID exists in Chrome
                            
                                How can I use Bootstrap Multiselect Dropdown in AngularJS
                            
                                CMake: adding custom resources to build directory
                            
                                Is there a way to use a property with same name but is of different type in derived class?
                            
                                NSKeyedUnarchiver - try/catch needed?
                            
                                mutt command with multiple attachments in single mail unix
                            
                                Does the following validation mean that the field cannot be null? ( @Size annotation )
                            
                                "Object" vs "Object Variable" in Java?
                            
                                What are the rules for modular arithmetic in C?
                            
                                Multi-level tables (inside another if clicked)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With