Should I use a JOIN function or run several queries in a loop structure?

Tags:

I have this 2 mysql tables: TableA and TableB

TableA
* ColumnAId
* ColumnA1
* ColumnA2
TableB
* ColumnBId
* ColumnAId
* ColumnB1
* ColumnB2

In PHP, I wanted to have this multidimensional array format

$array = array(
    array(
        'ColumnAId' => value,
        'ColumnA1' => value,
        'ColumnA2' => value,
        'TableB' => array(
            array(
                'ColumnBId' => value,
                'ColumnAId' => value,
                'ColumnB1' => value,
                'ColumnB2' => value
            )
        )
    )
);

so that I can loop it in this way

foreach($array as $i => $TableA) {
    echo 'ColumnAId' . $TableA['ColumnAId'];
    echo 'ColumnA1' . $TableA['ColumnA1'];
    echo 'ColumnA2' . $TableA['ColumnA2'];
    echo 'TableB\'s';
    foreach($value['TableB'] as $j => $TableB) {
        echo $TableB['...']...
        echo $TableB['...']...
    }
}

My problem is that, what is the best way or the proper way of querying MySQL database so that I can achieve this goal?

Solution1 --- The one I'm using

$array = array();
$rs = mysqli_query("SELECT * FROM TableA", $con);
while ($row = mysqli_fetch_assoc($rs)) {
    $rs2 = mysqli_query("SELECT * FROM Table2 WHERE ColumnAId=" . $row['ColumnAId'], $con);
    // $array = result in array
    $row['TableB'] = $array2;
}

I'm doubting my code cause its always querying the database.

Solution2

$rs = mysqli_query("SELECT * FROM TableA JOIN TableB ON TableA.ColumnAId=TableB.ColumnAId");
while ($row = mysqli_fet...) {
    // Code
}

The second solution only query once, but if I have thousand of rows in TableA and thousand of rows in TableB for each TableB.ColumnAId (1 TableA.ColumnAId = 1000 TableB.ColumnAId), thus this solution2 takes much time than the solution1?

549

asked Aug 16 '13 13:08

iSa

3 Answers

Neither of the two solutions proposed are probably optimal, BUT solution 1 is UNPREDICTABLE and thus INHERENTLY FLAWED!

One of the first things you learn when dealing with large databases is that 'the best way' to do a query is often dependent upon factors (referred to as meta-data) within the database:

How many rows there are.
How many tables you are querying.
The size of each row.

Because of this, there's unlikely to be a silver bullet solution for your problem. Your database is not the same as my database, you will need to benchmark different optimizations if you need the best performance available.

You will probably find that applying & building correct indexes (and understanding the native implementation of indexes in MySQL) in your database does a lot more for you.

There are some golden rules with queries which should rarely be broken:

Don't do them in loop structures. As tempting as it often is, the overhead on creating a connection, executing a query and getting a response is high.
Avoid SELECT * unless needed. Selecting more columns will significantly increase overhead of your SQL operations.
Know thy indexes. Use the EXPLAIN feature so that you can see which indexes are being used, optimize your queries to use what's available and create new ones.

Because of this, of the two I'd go for the second query (replacing SELECT * with only the columns you want), but there are probably better ways to structure the query if you have the time to optimize.

However, speed should NOT be your only consideration in this, there is a GREAT reason not to use suggestion one:

PREDICTABILITY: why read-locks are a good thing

One of the other answers suggests that having the table locked for a long period of time is a bad thing, and that therefore the multiple-query solution is good.

I would argue that this couldn't be further from the truth. In fact, I'd argue that in many cases the predictability of running a single locking SELECT query is a greater argument FOR running that query than the optimization & speed benefits.

First of all, when we run a SELECT (read-only) query on a MyISAM or InnoDB database (default systems for MySQL), what happens is that the table is read-locked. This prevents any WRITE operations from happening on the table until the read-lock is surrendered (either our SELECT query completes or fails). Other SELECT queries are not affected, so if you're running a multi-threaded application, they will continue to work.

This delay is a GOOD thing. Why, you may ask? Relational data integrity.

Let's take an example: we're running an operation to get a list of items currently in the inventory of a bunch of users on a game, so we do this join:

SELECT * FROM `users` JOIN `items` ON `users`.`id`=`items`.`inventory_id` WHERE `users`.`logged_in` = 1;

What happens if, during this query operation, a user trades an item to another user? Using this query, we see the game state as it was when we started the query: the item exists once, in the inventory of the user who had it before we ran the query.

But, what happens if we're running it in a loop?

Depending on whether the user traded it before or after we read his details, and in which order we read the inventory of the two players, there are four possibilities:

The item could be shown in the first user's inventory (scan user B -> scan user A -> item traded OR scan user B -> scan user A -> item traded).
The item could be shown in the second user's inventory (item traded -> scan user A -> scan user B OR item traded -> scan user B -> scan user A).
The item could be shown in both inventories (scan user A -> item traded -> scan user B).
The item could be shown in neither of the user's inventories (scan user B -> item traded -> scan user A).

What this means is that we would be unable to predict the results of the query or to ensure relational integrity.

If you're planning to give $5,000 to the guy with item ID 1000000 at midnight on Tuesday, I hope you have $10k on hand. If your program relies on unique items being unique when snapshots are taken, you will possibly raise an exception with this kind of query.

Locking is good because it increases predictability and protects the integrity of results.

Note: You could force a loop to lock with a transaction, but it will still be slower.

Oh, and finally, USE PREPARED STATEMENTS!

You should never have a statement that looks like this:

mysqli_query("SELECT * FROM Table2 WHERE ColumnAId=" . $row['ColumnAId'], $con);

mysqli has support for prepared statements. Read about them and use them, they will help you to avoid something terrible happening to your database.

140

answered Oct 04 '22 01:10

Glitch Desire

Definitely second way. Nested query is an ugly thing since you're getting all query overheads (execution, network e t.c.) every time for every nested query, while single JOIN query will be executed once - i.e. all overheads will be done only once.

Simple rule is not to use queries in cycles - in general. There could be exceptions, if one query will be too complex, so due to performance in should be split, but in a certain case that can be shown only by benchmarks and measures.

answered Oct 04 '22 02:10

Alma Do

If you want to do algorithmic evaluation of your data in your application code (which I think is the right thing to do), you should not use SQL at all. SQL was made to be a human readable way to ask for computational achieved data from a relational database, which means, if you just use it to store data, and do the computations in your code, you're doing it wrong anyway.

In such a case you should prefer using a different storage/retrieving possibility like a key-value store (there are persistent ones out there, and newer versions of MySQL exposes a key-value interface as well for InnoDB, but it's still using a relational database for key-value storage, aka the wrong tool for the job).

If you STILL want to use your solution:

Benchmark.

I've often found that issuing multiple queries can be faster than a single query, because MySQL has to parse less query, the optimizer has less work to do, and more often than not the MySQL optimzer just fails (that's the reason things like STRAIGHT JOIN and index hints exist). And even if it does not fail, multiple queries might still be faster depending on the underlying storage engine as well as how many threads try to access the data at once (lock granularity - this only applies with mixing in update queries though - neither MyISAM nor InnoDB lock the whole table for SELECT queries by default). Then again, you might even get different results with the two solutions if you don't use transactions, as data might change between queries if you use multiple queries versus a single one.

In a nutshell: There's way more to your question than what you posted/asked for, and what a generic answer can provide.

Regarding your solutions: I'd prefer the first solution if you have an environment where a) data changes are common and/or b) you have many concurrent running threads (requests) accessing and updating your tables (lock granularity is better with split up queries, as is cacheability of the queries) ; if your database is on a different network, e.g. network latency is an issue, you're probably better of with the first solution (but most people I know have MySQL on the same server, using socket connections, and local socket connections normally don't have much latency).

Situation may also change depending on how often the for loop is actually executed.

Again: Benchmark

Another thing to consider is memory efficiency and algorithmic efficiency. Later one is about O(n) in both cases, but depending on the type of data you use to join, it could be worse in any of the two. E.g. if you use strings to join (you really shouldn't, but you don't say), performance in the more php dependent solution also depends on php hash map algorithm (arrays in php are effectively hash maps) and the likelyhood of a collision, while mysql string indexes are normally fixed length, and thus, depending on your data, might not be applicable.

For memory efficiency, the multi query version is certainly better, as you have the php array anyway (which is very inefficient in terms of memory!) in both solutions, but the join might use a temp table depending on several circumstances (normally it shouldn't, but there ARE cases where it does - you can check using EXPLAIN for your queries)

answered Oct 04 '22 02:10

griffin

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Should I use a JOIN function or run several queries in a loop structure?

Tags:

php

optimization

mysql

iSa

People also ask

3 Answers

Neither of the two solutions proposed are probably optimal, BUT solution 1 is UNPREDICTABLE and thus INHERENTLY FLAWED!

PREDICTABILITY: why read-locks are a good thing

But, what happens if we're running it in a loop?

Oh, and finally, USE PREPARED STATEMENTS!

Glitch Desire

Alma Do

griffin

Recent Activity

Donate For Us

Should I use a JOIN function or run several queries in a loop structure?

Tags:

php

optimization

mysql

iSa

People also ask

3 Answers

Neither of the two solutions proposed are probably optimal, BUT solution 1 is UNPREDICTABLE and thus INHERENTLY FLAWED!

PREDICTABILITY: why read-locks are a good thing

But, what happens if we're running it in a loop?

Oh, and finally, USE PREPARED STATEMENTS!

Glitch Desire

Alma Do

griffin

Related questions

Recent Activity

Donate For Us