Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better to run 2 sql queries or 1 and deal with duplicate result set?

I have a webpage that has an ID as a GET variable that I need to pull the name, city, and state for that ID (stored in 1 table) as well as any data associated with it (stored in another table).

These are the results of a single query:

SELECT
    info.name, info.city, info.state,
    data.data1, data.data2, data.data3, data.data4
FROM
    data_table data,
    info_table info
WHERE
    data.id = 12345 AND info.id = data.id

name | city | state | data1 | data2 | data3 | data4
---------------------------------------------------
test | temp | AL    | 12    | 9     | 1     | 14
test | temp | AL    | 63    | 8     | 1     | 6
test | temp | AL    | 46    | 66    | 1     | 723
test | temp | AL    | 7     | 5     | 2     | 99
test | temp | AL    | 4     | 2     | 3     | 0
test | temp | AL    | 2     | 11    | 1     | 1

But the column data for name, city, state are all going to be identical for each row, so I could also do it with two queries and return the 'right' amount of data (but obviously taking twice as long to communicate with the server):

SELECT
    info.name, info.city, info.state,
FROM
    info_table info
WHERE
    info.id = 12345

name | city | state
-------------------
test | temp | AL   

...and...

SELECT
    data.data1, data.data2, data.data3, data.data4
FROM
    data_table data,
WHERE
    data.id = 12345

data1 | data2 | data3 | data4
-----------------------------
12    | 9     | 1     | 14
63    | 8     | 1     | 6
46    | 66    | 1     | 723
7     | 5     | 2     | 99
4     | 2     | 3     | 0
2     | 11    | 1     | 1

So, in general, is it necessarily better to use 2 queries and returning the exact amount of data I need? Or because of the (small) size of the data set being returned, just bite the dataset-larger-than-it-needs-to-be bullet and only run a single query?

I'm guessing that every situation varies, and if the total server communication time / 2 > time to transmit extra data, then a single query is better?

like image 626
WOUNDEDStevenJones Avatar asked Feb 07 '14 21:02

WOUNDEDStevenJones


2 Answers

Avoid the pitfalls of premature optimization, and just do the single JOIN instead of trying to do your JOIN operations client-side.

If it later turns out that duplicated data is a significant strain, you have better options for addressing the issue besides doing multiple queries.

For example, result sets can be compressed, reducing the the size of repeating data. The CPU overhead for the compression would likely be substantially less than attempting to do JOIN operations client-side.

like image 173
Michael Fredrickson Avatar answered Sep 29 '22 01:09

Michael Fredrickson


For a small resultset, where the amount of redundant data is insignificant, use one statement.

One of the "hidden" costs (in terms of the MySQL server) is the overhead for each statement. Each SQL statement has to be sent to the server... MySQL has to parse and prepare each statement. MySQL has to check that a statement is syntactically correct (keywords, commas, etc.), that the statement is semantically correct, that is that the identifiers (table names, column names, function names) are valid, and that the user has permission on all of the objects). After that MySQL can produce an execution plan, evaluating different access paths (full table scan vs. using an index, the join order, and so on.

For a small resultset, it's going to be more efficient (in terms of the MySQL server) to send a single statement and return a few redundant columns, than it is going to be to process two separate statements and preparing and returning two separate result sets.

There's network latency involved in sending the query, and retrieving the resulset. So doing that two times is going to outweigh the cost of doing it just once, and sending a couple hundred bytes of redundant data in the resultset.

On the other hand, if the amount of redundant data is going to be significant, that's going to consume memory and network bandwidth, or, if the execution plan for the query is not as efficient as running two separate queries.... in those cases, running two separate queries is going to be more efficient.

like image 31
spencer7593 Avatar answered Sep 29 '22 01:09

spencer7593