Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ResultSet and Select * Performance

I am refactoring some Spring JDBC code in which some of the costlier queries do "SELECT * FROM..." - and was about to start checking which columns were actually needed and just SELECT x , y FROM.. them. But reading through the ResultSet class is seemed that most data is lazily loaded. When you do a ResultSet.next() it moves the cursor in the database (Oracle 10g in this application) and when you do a ResultSet.getXX() it retrieves that column. So my thought was that if you do a "SELECT * " but only retrieve the columns you want you are not really taking a performance hit. Am I thinking about this correctly? The only place I can think of where this hurts you is inside the database because it is storing the query results in memory and has to use more memory then it would if only a few rows are selected, but if it's actually only storing pointers to the columns that hit the query then even this wouldn't be the case.

Thoughts?

NOTE : this only applies to standard ResultSet, I know CachedResultSet acts differently.

like image 793
Gandalf Avatar asked Jul 24 '09 16:07

Gandalf


2 Answers

I would be surprised if going from "SELECT *" to "SELECT A,B,C" gave you any meaningful performance improvement, unless you had a huge number of columns that you didn't need.

This is all very dependent on your database, your driver and your application, and most generalisations are going to be pretty meaningless.

The only reliable answer you're going to get from this is by benchmarking it - try "SELECT *", try "SELECT A,B,C", and see if there's improvement worth chasing.

like image 66
skaffman Avatar answered Sep 21 '22 17:09

skaffman


Depending on the table structure, the Oracle version, and the indexes involved, it is entirely possible that changing the set of columns you are selecting would substantially improve performance by changing query plans for the better. For most queries, the performance benefits may well be minimal, but overall it is generally good practice to name columns explicitly.

The simplest case where performance will be improved will occur when you have a "covered index" that the optimizer could use. If all the columns you are selecting and all the columns you are filtering by are part of a single index, that index is a covered index for the query. In that case, Oracle can avoid ever reading the data from the table and can just read the index.

There are other cases where performance will be improved as well. The optimizer may be able to perform table elimination if you have queries there are interim joins that don't affect the eventual output. If you are selecting all the columns, that optimization isn't possible. If you have tables with chained rows, eliminating columns can also eliminate the need to fetch the additional blocks where the eliminated columns reside. If there are LONG and LOB columns in the table, not selecting those columns would also result in large improvements.

Finally, eliminating columns will generally reduce the amount of space Oracle will require to sort and hash results before shipping them over the wire. And even though the ResultSet may lazily load data in the application server's RAM, it is probably not able to lazily fetch columns over the network. If you select all the columns from the table, the JDBC driver likely has to fetch at least 1 complete row at a time (more likely it is fetching 10 or 100 rows per network round-trip). And since the driver doesn't know when the data is fetched what columns are going to be requested, you'll have to ship all the data over the network.

like image 37
Justin Cave Avatar answered Sep 22 '22 17:09

Justin Cave