Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it better to use one complex query or several simpler ones?

Tags:

database

Which option is better:

  1. Writing a very complex query having large number of joins, or
  2. Writing 2 queries one after the other, applying the obtained result set of the processed query on other.
like image 906
Ankur Mukherjee Avatar asked Dec 27 '22 17:12

Ankur Mukherjee


2 Answers

Generally, one query is better than two, because the optimizer has more information to work with and may be able to produce a more efficient query plan than either separately. Additionally, using two (or more) queries typically means you'll be running the second query multiple times, and the DBMS might have to generate the query plan for the query repeatedly (but not if you prepare the statement and pass the parameters as placeholders when the query is (re)executed). This means fewer back and forth exchanges between the program and the DBMS. If your DBMS is on a server on the other side of the world (or country), this can be a big factor.

Arguing against combining the two queries, you might end up shipping a lot of repetitive data between the DBMS and the application. If each of 10,000 rows in table T1 is joined with an average of 30 rows from table T2 (so there are 300,000 rows returned in total), then you might be shipping a lot of data repeatedly back to the client. If the row size of (the relevant projection of) T1 is relatively small and the data from T2 is relatively large, then this doesn't matter. If the data from T1 is large and the data from T2 is small, then this may matter; measure before deciding.

like image 149
Jonathan Leffler Avatar answered Feb 23 '23 00:02

Jonathan Leffler


When I was a junior DB person I once worked for a year in a marketing dept where I had so much free time I did each task 2 or 3 different ways. I made a habit of writing one mega-select that grabbed everything in one go and comparing it to a script that built interim tables of selected primary keys and then once I had the correct keys went and got the data values.

In almost every case the second method was faster. the cases where it wasn't were when dealing with a small number of small tables. Where it was most noticeably faster was of course large tables and multiple joins.

I got into the habit of select the required primary keys from tableA, select the required primary keys from tableB, etc. Join them and select the final set of primary keys. Use the selected primary keys to go back to the tables and get the data values.

As a DBA I now understand that this method resulted in less purging of the data cache and played nicer with others using the DB (as mentioned by Amir Raminfar).

It does however require the use of temporary tables which some places / DBA don't like (unfairly in my mind)

like image 38
Karl Avatar answered Feb 23 '23 00:02

Karl