Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speeding up inner joins between a large table and a small table

This may be a silly question, but it may shed some light on how joins work internally.

Let's say I have a large table L and a small table S (100K rows vs. 100 rows).

Would there be any difference in terms of speed between the following two options?:

OPTION 1:                 OPTION 2: ---------                 --------- SELECT *                  SELECT * FROM L INNER JOIN S       FROM S INNER JOIN L ON L.id = S.id;           ON L.id = S.id; 

Notice that the only difference is the order in which the tables are joined.

I realize performance may vary between different SQL languages. If so, how would MySQL compare to Access?

like image 782
Zaid Avatar asked Feb 13 '10 08:02

Zaid


2 Answers

No, the order does not matter.

Almost all RDBMS's (such MS Access, MySQL, SQL Server, ORACLE etc) use a cost based optimiser based upon column statistics. In most situations, the optimiser will choose a correct plan. In the example you gave, the order will not matter (provided statistics are up to date).

To decide what query strategy to use, the Jet Engine optimizer uses statistics. The following factors are some of the factors that these statistics are based on:

  • The number of records in a table
  • The number of data pages in a table
  • The location of the table
  • Whether indexes are present
  • How unique the indexes are

Note: You cannot view Jet database engine optimization schemes, and you cannot specify how to optimize a query. However, you can use the Database Documenter to determine whether indexes are present and how unique an index is.

Based on these statistics, the Optimizer then selects the best internal query strategy for dealing with a particular query.

The statistics are updated whenever a query is compiled. A query is flagged for compiling when you save any changes to the query (or its underlying tables) and when the database is compacted. If a query is flagged for compiling, the compiling and the updating of statistics occurs the next time that the query is run. Compiling typically takes from one second to four seconds.

If you add a significant number of records to your database, you must open and then save your queries to recompile the queries. For example, if you design and then test a query by using a small set of sample data, you must re-compile the query after additional records are added to the database. When you do this, you want to make sure that optimal query performance is achieved when your application is in use.

Ref.

Might be of interest: ACC: How to Optimize Queries in Microsoft Access 2.0, Microsoft Access 95, and Microsoft Access 97

Tony Toews's Microsoft Access Performance FAQ is worth reading.

There's a caveat to "JOIN order does not matter".

If your RDBMS's cost based query optimiser times out creating the query plan then the join order COULD matter. Cost based optimisers have finite resources (both CPU time and memory) in which to construct a query plan. If they time out during the compilation stage, you will get the best plan found so far.

TLDR; If you have complex queries that receive a plan compilation timeout (not query execution timeout), then put your most restrictive joins first. That way, at the point the query plan optimiser times out, it will increase the chance that a 'better' plan was found.

Of course, if you are experiencing query plan compilation timeouts, you should probably simplify your query.

like image 91
Mitch Wheat Avatar answered Oct 01 '22 19:10

Mitch Wheat


I know Oracle's not on your list, but I think that most modern databases will behave that way.

You can see in the following execution plan, that there is no difference between the two statements.

It's a full access to each of the two tables (no index in my case), and then a HASH JOIN. Since you want everything from both tables, both tables need to be read and joined, the sequence does not have an impact.

--------------------------------------------------------------------------- | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     | --------------------------------------------------------------------------- |   0 | SELECT STATEMENT   |      |   100 |   700 |    42  (12)| 00:00:01 | |*  1 |  HASH JOIN         |      |   100 |   700 |    42  (12)| 00:00:01 | |   2 |   TABLE ACCESS FULL| S    |   100 |   300 |     2   (0)| 00:00:01 | |   3 |   TABLE ACCESS FULL| L    |   100K|   390K|    38   (8)| 00:00:01 | --------------------------------------------------------------------------- 
like image 24
Peter Lang Avatar answered Oct 01 '22 17:10

Peter Lang