Here is the situation:
I first need to run a query to know how many records exist.
For example: SELECT COUNT(DISTINCT userid) from users;
Often this will be all that's needed. However, sometimes (say 30% of the time) following the first query, the user will want to run a second query, detailing the records.
For example: SELECT * FROM users;
Is there any reason to run SELECT COUNT
initially instead of just SELECT
? That is, is making the count of records in SQL faster than actually pulling the records back? Or is it doing essentially the same work either way and so I should avoid doing two queries?
In other words, is it better to just always pull the records in the first query (not use COUNT
), then count the records in code (Java). If the user wants to run the second query, then great, I already have the data. If not, then just dump it.
What's the best practice here?
In other words, is it better to just always pull the records in the first query (not use COUNT ), then count the records in code (Java). If the user wants to run the second query, then great, I already have the data. If not, then just dump it. What's the best practice here? It's much faster to run the count in SQL.
However, if you only need the count, it is significantly faster to pull the count from the database than it is to actually retrieve rows. Also it is standard practice to only pull what you need. For instance, if you are counting all the rows in a table, most database implementations do not need to look at any rows.
One of the biggest and undead myths in SQL is that COUNT (*) is faster than COUNT (1). Or was it that COUNT (1) is faster than COUNT (*)? Impossible to remember, because there’s really no reason at all why one should be faster than the other. But is the myth justified? Let’s measure! How does COUNT (…) work? But first, let’s look into some theory.
The difference between count (*) and count (1) is a spin through this: But for each row, so it's probably like tens of microseconds on a large grouping! So, I was curious to see if it mattered. I ran a benchmark on the 4 most popular RDBMS, with these results: MySQL: Doesn’t matter.
If you know you need the data, go ahead and pull it and count it in code. However, if you only need the count, it is significantly faster to pull the count from the database than it is to actually retrieve rows. Also it is standard practice to only pull what you need.
For instance, if you are counting all the rows in a table, most database implementations do not need to look at any rows. Tables know how many rows they have. If the query has filters in the where
clause and it can use an index, it again will not need to look at the actual rows' data, just counts the rows from the index.
And all this is not counting the less data transferred.
A rule of thumb about database speeds is go ahead and try it for yourself. General rules are not always a good indicator. For instance, if the table was 10 rows and only a few columns, I might just pull the whole thing anyway on the off chance I needed it, since 2 round trips to the database would outweigh the cost of the query.
Two things should be considered
SELECT COUNT(DISTINCT userid) from users;
This query will go a whole lot faster with an index on userid
; If you do not have an index on userid
and none of the indexes you already have begin with userid
, then run this:
ALTER TABLE user ADD INDEX (userid);
This will make the Query Optimizer choose to look through the index rather than touch the table.
SELECT * from users;
Why bother to fetch every column in each row just to count the row?
You can replace that with
SELECT COUNT(id) FROM users;
where id is the PRIMARY KEY or
SELECT COUNT(1) FROM users;
You will have to benchmark which query is faster, SELECT COUNT(id)
or SELECT COUNT(1)
Unless you actually need the data while counting, let the counting happen in the server.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With