I have a webpage that has an ID as a GET
variable that I need to pull the name
, city
, and state
for that ID (stored in 1 table) as well as any data associated with it (stored in another table).
These are the results of a single query:
SELECT
info.name, info.city, info.state,
data.data1, data.data2, data.data3, data.data4
FROM
data_table data,
info_table info
WHERE
data.id = 12345 AND info.id = data.id
name | city | state | data1 | data2 | data3 | data4
---------------------------------------------------
test | temp | AL | 12 | 9 | 1 | 14
test | temp | AL | 63 | 8 | 1 | 6
test | temp | AL | 46 | 66 | 1 | 723
test | temp | AL | 7 | 5 | 2 | 99
test | temp | AL | 4 | 2 | 3 | 0
test | temp | AL | 2 | 11 | 1 | 1
But the column data for name
, city
, state
are all going to be identical for each row, so I could also do it with two queries and return the 'right' amount of data (but obviously taking twice as long to communicate with the server):
SELECT
info.name, info.city, info.state,
FROM
info_table info
WHERE
info.id = 12345
name | city | state
-------------------
test | temp | AL
...and...
SELECT
data.data1, data.data2, data.data3, data.data4
FROM
data_table data,
WHERE
data.id = 12345
data1 | data2 | data3 | data4
-----------------------------
12 | 9 | 1 | 14
63 | 8 | 1 | 6
46 | 66 | 1 | 723
7 | 5 | 2 | 99
4 | 2 | 3 | 0
2 | 11 | 1 | 1
So, in general, is it necessarily better to use 2 queries and returning the exact amount of data I need? Or because of the (small) size of the data set being returned, just bite the dataset-larger-than-it-needs-to-be bullet and only run a single query?
I'm guessing that every situation varies, and if the total server communication time / 2
> time to transmit extra data
, then a single query is better?
Avoid the pitfalls of premature optimization, and just do the single JOIN
instead of trying to do your JOIN
operations client-side.
If it later turns out that duplicated data is a significant strain, you have better options for addressing the issue besides doing multiple queries.
For example, result sets can be compressed, reducing the the size of repeating data. The CPU overhead for the compression would likely be substantially less than attempting to do JOIN
operations client-side.
For a small resultset, where the amount of redundant data is insignificant, use one statement.
One of the "hidden" costs (in terms of the MySQL server) is the overhead for each statement. Each SQL statement has to be sent to the server... MySQL has to parse and prepare each statement. MySQL has to check that a statement is syntactically correct (keywords, commas, etc.), that the statement is semantically correct, that is that the identifiers (table names, column names, function names) are valid, and that the user has permission on all of the objects). After that MySQL can produce an execution plan, evaluating different access paths (full table scan vs. using an index, the join order, and so on.
For a small resultset, it's going to be more efficient (in terms of the MySQL server) to send a single statement and return a few redundant columns, than it is going to be to process two separate statements and preparing and returning two separate result sets.
There's network latency involved in sending the query, and retrieving the resulset. So doing that two times is going to outweigh the cost of doing it just once, and sending a couple hundred bytes of redundant data in the resultset.
On the other hand, if the amount of redundant data is going to be significant, that's going to consume memory and network bandwidth, or, if the execution plan for the query is not as efficient as running two separate queries.... in those cases, running two separate queries is going to be more efficient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With