Performance difference: condition placed at INNER JOIN vs WHERE clause

Tags:

Say I have a table order as

id | clientid | type | amount | itemid | date ---|----------|------|--------|--------|----------- 23 | 258      | B    | 150    | 14     | 2012-04-03 24 | 258      | S    | 69     | 14     | 2012-04-03 25 | 301      | S    | 10     | 20     | 2012-04-03 26 | 327      | B    | 54     | 156    | 2012-04-04

clientid is a foreign-key back to the client table
itemid is a foreign key back to an item table
type is only B or S
amount is an integer

and a table processed as

id | orderid | processed | date ---|---------|-----------|--------- 41 | 23      | true      | 2012-04-03 42 | 24      | true      | 2012-04-03 43 | 25      | false     | <NULL> 44 | 26      | true      | 2012-04-05

I need to get all the rows from order that for the same clientid on the same date have opposing type values. Keep in mind type can only have one of two values - B or S. In the example above this would be rows 23 and 24.

The other constraint is that the corresponding row in processed must be true for the orderid.

My query so far

SELECT c1.clientid,        c1.date,        c1.type,        c1.itemid,        c1.amount,        c2.date,        c2.type,        c2.itemid,        c2.amount  FROM   order c1 INNER JOIN order c2 ON c1.itemid    =  c2.itemid AND                        c1.date      =  c2.date   AND                        c1.clientid  =  c2.clientid AND                        c1.type     <>  c2.type AND                        c1.id        <  c2.id  INNER JOIN processed p1 ON p1.orderid   =  c1.id AND                          p1.processed =  true INNER JOIN processed p2 ON p2.orderid   =  c2.id AND                          p2.processed =  true

QUESTION: Keeping the processed = true as part of the join clause is slowing the query down. If I move it to the WHERE clause then the performance is much better. This has piqued my interest and I'd like to know why.

The primary keys and respective foreign key columns are indexed while the value columns (value, processed etc) aren't.

Disclaimer: I have inherited this DB structure and the performance difference is roughly 6 seconds.

479

asked Jun 01 '12 10:06

Insectatorious

1 Answers

The reason that you're seeing a difference is due to the execution plan that the planner is putting together, this is obviously different depending on the query (arguably, it should be optimising the 2 queries to be the same and this may be a bug). This means that the planner thinks it has to work in a particular way to get to the result in each statement.

When you do it within the JOIN, the planner will probably have to select from the table, filter by the "True" part, then join the result sets. I would imagine this is a large table, and therefore a lot of data to look through, and it can't use the indexes as efficiently.

I suspect that if you do it in a WHERE clause, the planner is choosing a route that is more efficient (ie. either index based, or pre filtered dataset).

You could probably make the join work as fast (if not faster) by adding an index on the two columns (not sure if included columns and multiple column indexes are supported on Postgres yet).

In short, the planner is the problem it is choosing 2 different routes to get to the result sets, and one of those is not as efficient as the other. It's impossible for us to know what the reasons are without the full table information and the EXPLAIN ANALYZE information.

If you want specifics on why your specific query is doing this, you'll need to provide more information. However the reason is the planner choosing different routes.

Additional Reading Material:

http://www.postgresql.org/docs/current/static/explicit-joins.html

Just skimmed, seems that the postgres planner doesn't re-order joins to optimise it. try changing the order of the joins in your statement to see if you then get the same performance... just a thought.

145

answered Oct 02 '22 23:10

Martin

Related questions
                            
                                How do I get information about an index and table owner in Oracle?
                            
                                Stored procedure returns int instead of result set
                            
                                IF EXISTS before INSERT, UPDATE, DELETE for optimization
                            
                                Add business days to date in SQL without loops
                            
                                MYSQL - count number of rows in each table
                            
                                SQL Server row date last modified
                            
                                How to Replace Multiple Characters in SQL?
                            
                                Copy rows from the same table and update the ID column
                            
                                Oracle : select maximum value from different columns of the same row
                            
                                Need SQL Query to find Parent records without child records
                            
                                SQL conditional SELECT
                            
                                SQL Server 2008 IIF statement does not seem enabled
                            
                                How can I get the raw query string from Laravel's query builder BEFORE executing the query?
                            
                                Check if a variable is null in plsql
                            
                                CASE vs. DECODE
                            
                                SQL Server : check if variable is Empty or NULL for WHERE clause
                            
                                SQL ROWNUM how to return rows between a specific range
                            
                                How to store historical records in a history table in SQL Server
                            
                                What are the major differences between the mysql and oracle sql dialects?
                            
                                How to clear all cached items in Oracle

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performance difference: condition placed at INNER JOIN vs WHERE clause

Tags:

performance

sql

postgresql

query-optimization

Insectatorious

People also ask

1 Answers

Martin

Recent Activity

Donate For Us