Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Expensive is SQL ORDER BY?

Tags:

I don't quite understand how a SQL command would sort a large resultset. Is it done in memory on the fly (i.e. when a query is perfomed)?

Is is going to be faster to sort using ORDER BY in SQL rather than sort say a linked list of objects containing the results in a language like Java (assuming a fast built-in sort, probably using quicksort)?

like image 768
Joda Maki Avatar asked Feb 23 '11 22:02

Joda Maki


2 Answers

It will almost certainly be more efficient to sort the data in the database. Databases are designed to deal with large data volumes. And there are various optimizations available to the database that would not be available to the middle tier. If you plan on writing a hyper-efficient sort routine in the middle tier that takes advantage of information that you have about your data that the database doesn't (i.e. farming the data out to a cluster of dozens of middle tier machines so that the sort never spills to disk, taking advantage of the fact that your data is mostly ordered to choose an algorithm that wouldn't normally be particularly efficient), you can probably beat the database's sort speed. But that tends to be rare.

Depending on the query, for example, the database optimizer may choose a query plan that returns the data in order without performing a sort. For example, the database knows that the data in an index is sorted so it may choose to do an index scan to return the data in order without ever having to materialize and sort the entire result set. If it does have to materialized the entire result, it only needs the columns you are sorting by and some sort of row identifier (i.e. a ROWID in Oracle) rather than sorting an entire row of data like a naive middle tier implementation is likely to do. For example, if you have a composite index on (col1, col2) and you decide to sort on UPPER(col2), LOWER(col1), the database could read the col1 & col2 values from the index, sort the row identifiers, and then go fetch the data from the table. Of course, the database doesn't have to do this-- the optimizer will take into account the cost of doing a sort against the cost of fetching the data from the table or from the various indexes. The database may well conclude that the most efficient approach is to do a table scan, read the entire row into memory, and sort it. It may conclude that leveraging an index results in more I/O to fetch the data but makes up for it by reducing or eliminating the sort costs.

like image 121
Justin Cave Avatar answered Oct 21 '22 16:10

Justin Cave


The answer is... it depends. If the ORDER BY part can be done by using an index in the database, then the execution plan for the query will use that index and the results will come back in the right order straight from the DB. If not, then the database will perform the sorting, but it's likely better at it than you reading all the results into memory (and certainly better than reading the results into a linked list).

like image 39
Chris Nash Avatar answered Oct 21 '22 15:10

Chris Nash