Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it bad for performance to select all columns?

Is it bad to SELECT all columns at once even though you probably don't neeed all of them? However you might need them in another task but you are to lazy to write queries for every task.

Should you only do queries where you SELECT only columns you need and do this query again if you need another column?

So basically the question is: Does it has any effect on performance to SELECT one column vs multiple columns?

The query is very simple (no functions, joins etc.) For example:

SELECT
id, name, status, date
FROM user_table
WHERE user_id = :user_id
like image 920
yoshi Avatar asked Aug 02 '14 08:08

yoshi


People also ask

Does the number of columns affect query performance?

There will be no performance difference based on the column position.

How many columns is too much SQL?

MySQL has hard limit of 4096 columns per table, but the effective maximum may be less for a given table.

Which is faster SELECT or SELECT column?

SELECT field is faster than select *. Because if you have more than 1 field/column in your table then select * will return all of those, and that requires network bandwidth and more work for the database to fetch all the other fields.

Why SELECT * is bad in SQL?

By using SELECT *, you can be returning unnecessary data that will just be ignored, but fetching that data is not free of cost. This results in some wasteful IO cycles at the database end since you will be reading all of that data off the pages when perhaps you could have read the data from index pages.


2 Answers

The issue here isn't so much a matter of the database server, as just the network communication. By selecting all columns at once, you're telling the server to return to you, all columns at once. As for concerns over IO and all that, those are addressed nicely in the question and answer @Karamba gave in a comment: select * vs select column. But for most real-world applications (and I use "applications" in every sense), the main concern is just network traffic and how long it takes to serialize, transmit, then deserialize the data. Although really, the answer is the same either way.

So pulling back all the columns is great, if you intend to use them all, but that can be a lot of extra data transfer, particularly if you store, say, lengthy strings in your columns. In many cases, of course, the difference will be undetectable and is mostly just a matter of principle. Not all, but a significant majority.

It's really just a trade-off between your aforementioned laziness (and trust me, we all feel that way) now and how important performance really is.

That all said, if you do intend to use all the column values, you're much better off pulling them all back at once then you are filing a bunch of queries.

Think of it like doing a web search: you do your search, you find your page, and you only need one detail. You could read the entire page and know everything about the subject, or you could just jump to the part about what you're looking for and be done. The latter is a lot faster if that's all you ever want, but if you're then going to have to learn about the other aspects, you'd be way better off reading them the first time than having to do your search again and find the site to talk about it.

If you aren't sure whether you'll need the other column values in the future, then that's your call to make as the developer for which case is more likely.

It all depends on what your application is, what your data is, how you're using it, and how important performance really is to you.

like image 147
Matthew Haugen Avatar answered Oct 22 '22 02:10

Matthew Haugen


Selecting a single column can have a large effect on the performance of certain queries. For example, it is more efficient for the query engine to process an index rather than look up data in the original data pages. If a covering index is available -- that is, an index that contains all the columns needed for a query -- then the query will run faster. For large tables that are too big for available memory, the use of a covering index can be a big, big win. (Think orders of magnitude improvement in performance in some cases.)

Another case when a limited number of columns is beneficial is when one or more of the columns are very large, such as a BLOB or TEXT column. These can grow in size to tens of thousands of bytes or even megabytes. Retrieving them and put a big load on the server.

There is a danger in using *, if you have prepared statements and the underlying structure of the table changes. The query itself could get out-of-date (I've had this problem on other databases, but not specifically on MySQL). The underlying change could be as simple as changing the name of a column. What would be caught as a compile time error is instead a run-time error that might be much more mysterious.

In general, the reasons given for avoiding * have more to do with network performance. In many cases, it is not going to make much difference. If you are returning 20 rows from a table where each row contains, on average 100 or 200 bytes, then then difference between selecting all the columns and a subset of the columns will be minor in most hardware environments. The vast majority of the time the spent for the query will be for compiling the query, executing it in the engine, and reading the data pages. The difference between returning 200 bytes or 2000 bytes probably won't be a big difference.

However, there are cases (such as the ones listed above) where it can make a big difference. So, avoiding * is a good habit, but using it now and then probably isn't going to bring down your system.

like image 24
Gordon Linoff Avatar answered Oct 22 '22 02:10

Gordon Linoff