i have a table called orders. one column on order is customer_id i have a table called customers with 10 fields Given the two options if i want to build up an array of order objects and embedded in an order object is a customer object i have two choices. <h3>Option 1:</h3> a. first query orders table. b. loop through records and query the persons table to get the records for the person this would be something like: <pre class="prettyprint"><code> Select * from APplications Select * from Customer where id = 1 Select * from Customer where id = 2 Select * from Customer where id = 3 Select * from Customer where id = etc . . . </code></pre> <h3>Option 2:</h3> a. do a join on all fields its an obvious #2 because you are only doing one query versus 1 + [numberOforders] queries (could be hundreds or more) This would be something like: <pre class="prettyprint"><code> Select * from Applications a, Customers c Innerjoin c.id = a.customerID </code></pre> my main question is, what if i had 10 other tables that were off of the orders table (similar to customer) where you had the id in the order table. should you do a single query that joins these 10 tables or at some point is it inefficient do to this: any suggestions would help.. is there any optimization to ensure fast performance

If this customer_id is unique in your customer-table (and the other IDs are unique in the other tables), so your query only returns 1 row per Application, then doing a single SELECT is certainly more efficient. Joining all the required customers in one query will be optimized, while using lots of single SELECTs can't. EDIT I tried this with Oracle PL/SQL with 50.000 applications and 50.000 matching customers. Solution with selecting everything in one query took <code>0.172 s</code> Solution with selecting every customer in a single SELECT took <code>1.984 s</code> And this is most likely getting worse with other clients or when accessing over network.

Single join should be faster for two main reasons. If you are querying over a network, then there is overhead in using number of queries instead of a single query. A join would be optimized inside the DBMS using the query optimizer so will be faster than executing several queries.

Which provides better performance one big join or multiple queries?

Option 1:

a. first query orders table. b. loop through records and query the persons table to get the records for the person

this would be something like:

 Select * from APplications

 Select * from Customer where id = 1
 Select * from Customer where id = 2
 Select * from Customer where id = 3
 Select * from Customer where id = etc . . .

Option 2:

a. do a join on all fields

its an obvious #2 because you are only doing one query versus 1 + [numberOforders] queries (could be hundreds or more)

This would be something like:

 Select * from Applications a, Customers c
 Innerjoin c.id = a.customerID

my main question is, what if i had 10 other tables that were off of the orders table (similar to customer) where you had the id in the order table. should you do a single query that joins these 10 tables or at some point is it inefficient do to this:

any suggestions would help.. is there any optimization to ensure fast performance

606

asked Dec 19 '09 05:12

leora

3 Answers

I agree with everyone who's said a single join will probably be more efficient, even with a lot of tables. It's also less development effort than doing the work in your application code. This assumes the tables are appropriately indexed, with an index on each foreign key column, and (of course) an index on each primary key column.

Your best bet is to try the easiest approach (the big join) first, and see how well it performs. If it performs well, then great - you're done. If it performs poorly, profile the query and look for missing indexes on your tables.

Your option #1 is not likely to perform well, due to the number of network round-trips (as anijhaw mentioned). This is sometimes called the "select N+1" problem - you do one SELECT to get the list of N applications, and then do N SELECTs in a loop to get the customers. This record-at-a-time looping is natural to application programmers; but SQL works much better when you operate on whole sets of data at once.

If option #2 is slow even with good indexing, you may want to look into caching. You can cache in the database (using a summary table or materialized/indexed view), in the application (if there is enough RAM), or in a dedicated caching server such as memcached. Of course, this depends on how up-to-date your query results need to be. If everything has to be fully up-to-date, then any cache would have to be updated whenever the underlying tables are updated - it gets complicated and becomes less useful.

This sounds like a reporting query though, and reporting often doesn't need to be real-time. So caching might be able to help you.

Depending on your DBMS, another thing to think about is the impact of this query on other queries hitting the same database. If your DBMS allows readers to block writers, then this query could prevent updates to the tables if it takes a long time to run. That would be bad. Oracle doesn't have this problem, and neither does SQL Server when run in "read committed snapshot" mode. I don't know about MySQL though.

197

answered Oct 12 '22 21:10

Richard Beier

If this customer_id is unique in your customer-table (and the other IDs are unique in the other tables), so your query only returns 1 row per Application, then doing a single SELECT is certainly more efficient.

Joining all the required customers in one query will be optimized, while using lots of single SELECTs can't.

EDIT
I tried this with Oracle PL/SQL with 50.000 applications and 50.000 matching customers.

Solution with selecting everything in one query took
0.172 s

Solution with selecting every customer in a single SELECT took
1.984 s

And this is most likely getting worse with other clients or when accessing over network.

answered Oct 12 '22 19:10

Peter Lang

Single join should be faster for two main reasons.

If you are querying over a network, then there is overhead in using number of queries instead of a single query.

A join would be optimized inside the DBMS using the query optimizer so will be faster than executing several queries.

answered Oct 12 '22 21:10

anijhaw

Related questions
                            
                                T-SQL Subquery Max(Date) and Joins
                            
                                Validate email addresses in Mysql
                            
                                How can I avoid ResultSet is closed exception in Java?
                            
                                MySQL - Select only numeric values from varchar column
                            
                                Getting the Last Insert ID with SQLite.NET in C#
                            
                                Redmine: Copy issue multiple times
                            
                                What GOOD tools are available for generating ERD from a SQL Server Database? [closed]
                            
                                Wiki Database, is there one?
                            
                                MySql Tinytext vs Varchar vs Char
                            
                                PDO SQL-state "00000" but still error? [duplicate]
                            
                                Is it bad to omit semicolon in MySQL queries? [closed]
                            
                                Good resources for learning PL/pgSQL? [closed]
                            
                                Whats the size of an SQL Int(N)?
                            
                                LIMITing an SQL JOIN
                            
                                When to use R, when to use SQL?
                            
                                Ways to implement tags - pros and cons of each
                            
                                Why is running a query on SQL Azure so much slower?
                            
                                How to store only time; not date and time?
                            
                                Updating and join on multiple rows, which row's value is used?
                            
                                C++ SQL database library comparison [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which provides better performance one big join or multiple queries?

Tags:

sql

database

database-design

query-optimization