Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is using COUNT(*) or SELECT * a good idea?

Tags:

sql

database

I've heard several times that you shouldn't perform COUNT(*) or SELECT * for performance reasons, but wasn't able to dig up some further information about it.

I can imagine that the database is then using all columns for the action, which can be an impressive performance loss, but I'm not sure about that. Does somebody have further information about the topic?

like image 930
Bobby Avatar asked Dec 14 '10 08:12

Bobby


People also ask

Why select * is not recommended?

Avoid using SELECT * There are many reasons for that recommendation, like: SELECT * Retrieves unnecessary data besides that it may increase the network traffic used for your queries. When you SELECT *, it is possible to retrieve two columns of the same name from two different tables (when using JOINS for example).

Is select * faster than selecting columns?

Selecting distinct and less than all columns will always be faster than selecting *.

What is the difference between select * and select COUNT (*)?

Select * Would return the entire table while Select Count(*) would return the number of rows.

Which is better COUNT 1 or COUNT (*)?

The simple answer is no – there is no difference at all. The COUNT(*) function counts the total rows in the table, including the NULL values. The semantics for COUNT(1) differ slightly; we'll discuss them later. However, the results for COUNT(*) and COUNT(1) are identical.


2 Answers

1. On count(*) vs. count(something else)

SQL is declarative in that you specify what you want. This is different from specifying how to get what you want. That means the database engine is free to realize your query in whatever way it thinks is the most efficient. Many database optimizers rewrites your query to a less costly alternative (if such a plan is available).

Given the following table:

table(
   pk       not null
  ,color    not null
  ,nullable null
  ,unique(pk)
  ,index(color)
);

...all of the following are functionally equivalent (due to the mechanics of count and nulls):

1) select count(*) from table;
2) select count(1) from table;
3) select count(pk) from table;
4) select count(color) from table;

Regardless of which form you use, the optimizer is free to rewrite the query to another form if it is more efficient. (Again, not all optimizers are sophisticated enough to do this). The unique index(pk) would be smaller (bytes occupied) than the entire table. Therefore it would be more efficient to count the number of index entries rather than scanning through the entire table. In Oracle we have bitmap indexes, which also compress repeating strings. If we had used such an index on the color column, it would probably have been the smallest index to scan. Oracle also supports table compression which in some cases makes the physical table smaller than a composite index.

1. TL;DR; Your specific dbms will have its own set of tools that enables different rewriting rules and in turn execution plans. That renders the question somewhat useless (unless we talk about a specific release of a specific dbms). I recommend COUNT(*) in all cases because it requires the least cognitive effort to grasp.

2. On select a,b,c vs. select *

There are very few valid uses of SELECT * in code you write and put into production. Imagine a table which contains Bluray movies (yes, the movies is stored as a blob in this table). So you slapped together your awesomesauce abstraction layer and put SELECT * FROM movies where id = ? in the getMovies(movie_id) method. I will refrain myself from explaining why SELECT name FROM movies will be transported across the network just a tad faster. Of course, in most realistic cases it won't have a noticable impact.

One last point on performance is that when all the referenced columns (selected, filtered) in your query exists as an index (called a covering index), the database need not touch the table at all. It can be fully resolved from scanning the index only. By selecting all columns you remove this option from the optimizer.

Another thing about SELECT * which is far more serious than anything, is that it creates an implicit dependency on a specific physical layout of the table. Let me explain. Consider the following tables:

table T1(name, id)
table T2(name, id)

The following statement...

insert into t1 select * from t2;

... will break or produce a different result if any of the following happens:

  • Any of the tables columns are rearranged for example T1(id, name)
  • T1 gets an additional not-null column
  • T2 gets another column

2. TL;DR; When possible, explicitly specify the columns you want (eventually, you'll have to do that anyway). Also, selecting fewer columns are faster than selecting more columns. A possitive side-effect on explicit selects is that it gives greater freedom to the optimizer.

like image 77
Ronnis Avatar answered Oct 26 '22 22:10

Ronnis


COUNT(*) is different from COUNT(column1) !
COUNT(*) returns the number of records, and does NOT use more resources, while COUNT(column1) counts the number of records where column1 is non null.

For SELECT, it is different. SELECT * will of course request more data.

like image 42
iDevlop Avatar answered Oct 27 '22 00:10

iDevlop