i have a table called orders. one column on order is customer_id
i have a table called customers with 10 fields
Given the two options if i want to build up an array of order objects and embedded in an order object is a customer object i have two choices.
a. first query orders table. b. loop through records and query the persons table to get the records for the person
this would be something like:
Select * from APplications
Select * from Customer where id = 1
Select * from Customer where id = 2
Select * from Customer where id = 3
Select * from Customer where id = etc . . .
a. do a join on all fields
its an obvious #2 because you are only doing one query versus 1 + [numberOforders] queries (could be hundreds or more)
This would be something like:
Select * from Applications a, Customers c
Innerjoin c.id = a.customerID
my main question is, what if i had 10 other tables that were off of the orders table (similar to customer) where you had the id in the order table. should you do a single query that joins these 10 tables or at some point is it inefficient do to this:
any suggestions would help.. is there any optimization to ensure fast performance
In Postgres (and probably any RDBMS to a similar extent, MySQL to a lesser extent), fewer queries are almost always much faster.
I know that one INSERT compared to say 10 individual INSERT queries is much faster. It might be important whether your multiple queries are inside a stored procedure of if they originate from the application (edit your question with this info). The former will be much quicker than the later.
I won't leave you in suspense, between Joins and Subqueries, joins tend to execute faster. In fact, query retrieval time using joins will almost always outperform one that employs a subquery. The reason is that joins mitigate the processing burden on the database by replacing multiple queries with one join query.
In case there are a large number of rows in the tables and there is an index to use, INNER JOIN is generally faster than OUTER JOIN. Generally, an OUTER JOIN is slower than an INNER JOIN as it needs to return more number of records when compared to INNER JOIN.
I agree with everyone who's said a single join will probably be more efficient, even with a lot of tables. It's also less development effort than doing the work in your application code. This assumes the tables are appropriately indexed, with an index on each foreign key column, and (of course) an index on each primary key column.
Your best bet is to try the easiest approach (the big join) first, and see how well it performs. If it performs well, then great - you're done. If it performs poorly, profile the query and look for missing indexes on your tables.
Your option #1 is not likely to perform well, due to the number of network round-trips (as anijhaw mentioned). This is sometimes called the "select N+1" problem - you do one SELECT to get the list of N applications, and then do N SELECTs in a loop to get the customers. This record-at-a-time looping is natural to application programmers; but SQL works much better when you operate on whole sets of data at once.
If option #2 is slow even with good indexing, you may want to look into caching. You can cache in the database (using a summary table or materialized/indexed view), in the application (if there is enough RAM), or in a dedicated caching server such as memcached. Of course, this depends on how up-to-date your query results need to be. If everything has to be fully up-to-date, then any cache would have to be updated whenever the underlying tables are updated - it gets complicated and becomes less useful.
This sounds like a reporting query though, and reporting often doesn't need to be real-time. So caching might be able to help you.
Depending on your DBMS, another thing to think about is the impact of this query on other queries hitting the same database. If your DBMS allows readers to block writers, then this query could prevent updates to the tables if it takes a long time to run. That would be bad. Oracle doesn't have this problem, and neither does SQL Server when run in "read committed snapshot" mode. I don't know about MySQL though.
If this customer_id is unique in your customer-table (and the other IDs are unique in the other tables), so your query only returns 1 row per Application, then doing a single SELECT is certainly more efficient.
Joining all the required customers in one query will be optimized, while using lots of single SELECTs can't.
EDIT
I tried this with Oracle PL/SQL with 50.000 applications and 50.000 matching customers.
Solution with selecting everything in one query took0.172 s
Solution with selecting every customer in a single SELECT took1.984 s
And this is most likely getting worse with other clients or when accessing over network.
Single join should be faster for two main reasons.
If you are querying over a network, then there is overhead in using number of queries instead of a single query.
A join would be optimized inside the DBMS using the query optimizer so will be faster than executing several queries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With