Say I have a table order
as
id | clientid | type | amount | itemid | date ---|----------|------|--------|--------|----------- 23 | 258 | B | 150 | 14 | 2012-04-03 24 | 258 | S | 69 | 14 | 2012-04-03 25 | 301 | S | 10 | 20 | 2012-04-03 26 | 327 | B | 54 | 156 | 2012-04-04
clientid
is a foreign-key back to the client
tableitemid
is a foreign key back to an item
tabletype
is only B
or S
amount
is an integerand a table processed
as
id | orderid | processed | date ---|---------|-----------|--------- 41 | 23 | true | 2012-04-03 42 | 24 | true | 2012-04-03 43 | 25 | false | <NULL> 44 | 26 | true | 2012-04-05
I need to get all the rows from order
that for the same clientid
on the same date
have opposing type
values. Keep in mind type
can only have one of two values - B
or S
. In the example above this would be rows 23
and 24
.
The other constraint is that the corresponding row in processed
must be true
for the orderid
.
My query so far
SELECT c1.clientid, c1.date, c1.type, c1.itemid, c1.amount, c2.date, c2.type, c2.itemid, c2.amount FROM order c1 INNER JOIN order c2 ON c1.itemid = c2.itemid AND c1.date = c2.date AND c1.clientid = c2.clientid AND c1.type <> c2.type AND c1.id < c2.id INNER JOIN processed p1 ON p1.orderid = c1.id AND p1.processed = true INNER JOIN processed p2 ON p2.orderid = c2.id AND p2.processed = true
QUESTION: Keeping the processed = true
as part of the join clause is slowing the query down. If I move it to the WHERE clause then the performance is much better. This has piqued my interest and I'd like to know why.
The primary keys and respective foreign key columns are indexed while the value columns (value
, processed
etc) aren't.
Disclaimer: I have inherited this DB structure and the performance difference is roughly 6 seconds.
In MSSQL, both queries are compiled to the same execution plan, so there's no difference.
Always put the join conditions in the ON clause if you are doing an INNER JOIN . So, do not add any WHERE conditions to the ON clause, put them in the WHERE clause. If you are doing a LEFT JOIN , add any WHERE conditions to the ON clause for the table in the right side of the join.
No, it doesn't. Query optimizer will transform your code anyway. You better choose a convention and go with it.
A where clause will generally increase the performance of the database. Generally, it is more expensive to return data and filter in the application. The database can optimize the query, using indexes and partitions. The database may be running in parallel, executing the query in parallel.
The reason that you're seeing a difference is due to the execution plan that the planner is putting together, this is obviously different depending on the query (arguably, it should be optimising the 2 queries to be the same and this may be a bug). This means that the planner thinks it has to work in a particular way to get to the result in each statement.
When you do it within the JOIN, the planner will probably have to select from the table, filter by the "True" part, then join the result sets. I would imagine this is a large table, and therefore a lot of data to look through, and it can't use the indexes as efficiently.
I suspect that if you do it in a WHERE clause, the planner is choosing a route that is more efficient (ie. either index based, or pre filtered dataset).
You could probably make the join work as fast (if not faster) by adding an index on the two columns (not sure if included columns and multiple column indexes are supported on Postgres yet).
In short, the planner is the problem it is choosing 2 different routes to get to the result sets, and one of those is not as efficient as the other. It's impossible for us to know what the reasons are without the full table information and the EXPLAIN ANALYZE information.
If you want specifics on why your specific query is doing this, you'll need to provide more information. However the reason is the planner choosing different routes.
Additional Reading Material:
http://www.postgresql.org/docs/current/static/explicit-joins.html
Just skimmed, seems that the postgres planner doesn't re-order joins to optimise it. try changing the order of the joins in your statement to see if you then get the same performance... just a thought.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With