Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is COUNT faster than pulling the records and counting in code?

Tags:

Here is the situation:

  1. I first need to run a query to know how many records exist.

    For example: SELECT COUNT(DISTINCT userid) from users;

  2. Often this will be all that's needed. However, sometimes (say 30% of the time) following the first query, the user will want to run a second query, detailing the records.

    For example: SELECT * FROM users;

Is there any reason to run SELECT COUNT initially instead of just SELECT? That is, is making the count of records in SQL faster than actually pulling the records back? Or is it doing essentially the same work either way and so I should avoid doing two queries?

In other words, is it better to just always pull the records in the first query (not use COUNT), then count the records in code (Java). If the user wants to run the second query, then great, I already have the data. If not, then just dump it.

What's the best practice here?

like image 363
martinez314 Avatar asked Apr 09 '13 21:04

martinez314


People also ask

Is it better to count records in code (Java) or count?

In other words, is it better to just always pull the records in the first query (not use COUNT ), then count the records in code (Java). If the user wants to run the second query, then great, I already have the data. If not, then just dump it. What's the best practice here? It's much faster to run the count in SQL.

Is it faster to pull the count or rows from the database?

However, if you only need the count, it is significantly faster to pull the count from the database than it is to actually retrieve rows. Also it is standard practice to only pull what you need. For instance, if you are counting all the rows in a table, most database implementations do not need to look at any rows.

Is count (*) faster than count(1) in SQL?

One of the biggest and undead myths in SQL is that COUNT (*) is faster than COUNT (1). Or was it that COUNT (1) is faster than COUNT (*)? Impossible to remember, because there’s really no reason at all why one should be faster than the other. But is the myth justified? Let’s measure! How does COUNT (…) work? But first, let’s look into some theory.

What is the difference between count (*) and Count(1)?

The difference between count (*) and count (1) is a spin through this: But for each row, so it's probably like tens of microseconds on a large grouping! So, I was curious to see if it mattered. I ran a benchmark on the 4 most popular RDBMS, with these results: MySQL: Doesn’t matter.


2 Answers

If you know you need the data, go ahead and pull it and count it in code. However, if you only need the count, it is significantly faster to pull the count from the database than it is to actually retrieve rows. Also it is standard practice to only pull what you need.

For instance, if you are counting all the rows in a table, most database implementations do not need to look at any rows. Tables know how many rows they have. If the query has filters in the where clause and it can use an index, it again will not need to look at the actual rows' data, just counts the rows from the index.

And all this is not counting the less data transferred.

A rule of thumb about database speeds is go ahead and try it for yourself. General rules are not always a good indicator. For instance, if the table was 10 rows and only a few columns, I might just pull the whole thing anyway on the off chance I needed it, since 2 round trips to the database would outweigh the cost of the query.

like image 156
cmd Avatar answered Oct 12 '22 01:10

cmd


Two things should be considered

QUERY #1

SELECT COUNT(DISTINCT userid) from users; 

This query will go a whole lot faster with an index on userid; If you do not have an index on userid and none of the indexes you already have begin with userid, then run this:

ALTER TABLE user ADD INDEX (userid); 

This will make the Query Optimizer choose to look through the index rather than touch the table.

QUERY #2

SELECT * from users; 

Why bother to fetch every column in each row just to count the row?

You can replace that with

SELECT COUNT(id) FROM users; 

where id is the PRIMARY KEY or

SELECT COUNT(1) FROM users; 

You will have to benchmark which query is faster, SELECT COUNT(id) or SELECT COUNT(1)

EPILOGUE

Unless you actually need the data while counting, let the counting happen in the server.

like image 21
RolandoMySQLDBA Avatar answered Oct 12 '22 00:10

RolandoMySQLDBA