I have this query ...which runs extremely slowly (almost a minute):
select distinct main.PrimeId
from PRIME main
join
(
select distinct p.PrimeId from PRIME p
left outer join ATTRGROUP a
on p.PrimeId = a.PrimeId or p.PrimeId = a.RelatedPrimeId
where a.PrimeId is not null and a.RelatedPrimeId is not null
) mem
on main.PrimeId = mem.PrimeId
The PRIME table has 18k rows, and has PK on PrimeId.
The ATTRGROUP table has 24k rows, and has a composite PK on PrimeId, col2, then RelatedPrimeId, and then cols 4-7. There's also a separate index on RelatedPrimeId.
The query eventually returns 8.5k rows - distinct values of PrimeId on the PRIME table that match either PrimeId or RelatedPrimeId on the ATTRGROUP table
I have the identical query, using ATTRADDRESS instead of ATTRGROUP. ATTRADDRESS has an identical key and index structure as ATTRGROUP. It has only 11k rows on it, which is smaller, admittedly, but in that case, the query runs in about a second, and returns 11k rows.
So my question is this:
How can the query be so much slower on one table than another, despite the structures being identical.
So far, I've tried this on SQL 2005, and (using the same database, upgraded) SQL 2008 R2. Two of us have independently obtained the same results, restoring the same backup to two different computers.
Other details:
However, the actual number of rows on that table is a little over 24k, not 320M !
If I refactor the part of the query inside the brackets, so that it uses a UNION rather than an OR, thus:
select distinct main.PrimeId
from PRIME main
join
(
select distinct p.PrimeId from PRIME p
left outer join ATTRGROUP a
on p.PrimeId = a.PrimeId
where a.PrimeId is not null and a.RelatedPrimeId is not null
UNION
select distinct p.PrimeId from PRIME p
left outer join ATTRGROUP a
on p.PrimeId = a.RelatedPrimeId
where a.PrimeId is not null and a.RelatedPrimeId is not null
) mem
on main.PrimeId = mem.PrimeId
... then the slow query takes under a second.
I'd greatly appreciate any insight on this! Let me know if you need any more info and I'll update the question. Thanks!
By the way, I realise that in this example there's a redundant join. This can't easily be removed, since in production the whole thing is generated dynamically, and the bit in the brackets takes many different forms.
Edit:
I've rebuilt the indexes on ATTRGROUP, makes no significant difference.
Edit 2:
If I use a temporary table, thus:
select distinct p.PrimeId into #temp
from PRIME p
left outer join ATTRGROUP a
on p.PrimeId = a.PrimeId or p.PrimeId = a.RelatedPrimeId
where a.PrimeId is not null and a.RelatedPrimeId is not null
select distinct main.PrimeId
from Prime main join
#temp mem
on main.PrimeId = mem.PrimeId
... then again, even with an OR in the original OUTER JOIN, it runs in less than a second. I hate temp tables like this, since it always feels like an admission of defeat, so it isn't the refactor I'll be using, but I thought it was interesting that it makes such a difference.
Edit 3:
Updating the stats makes no difference either.
Thanks for all your suggestions so far.
Queries can become slow for various reasons ranging from improper index usage to bugs in the storage engine itself. However, in most cases, queries become slow because developers or MySQL database administrators neglect to monitor them and keep an eye on their performance.
The first time you run your query, data is read from storage. The next time you run that query, a lot of the data and indexes will be cached in memory.
Your application itself changed, and it's not able to digest the results as quickly. Someone patched, and it had an unexpected side effect. You have the same plan, but different memory grants. Someone's modifying more rows at a time, so you're hitting lock escalation.
In my experience its better to use two left joins rather than an OR in the JOIN clause. So instead of:
left outer join ATTRGROUP a
on p.PrimeId = a.PrimeId or p.PrimeId = a.RelatedPrimeId
I would suggest:
left outer join ATTRGROUP a
on p.PrimeId = a.PrimeId
left outer join ATTRGROUP a2
on p.PrimeId = a2.RelatedPrimeId
I notice that the main-query isn't correlated with the sub-query:
select distinct main.PrimeId
from PRIME main
join
(
select distinct p.PrimeId from PRIME p
left outer join ATTRGROUP a
on p.PrimeId = a.PrimeId
where *main.PrimeId = a.PrimeId*
UNION
select distinct p.PrimeId from PRIME p
left outer join ATTRGROUP a
on p.PrimeId = a.RelatedPrimeId
where *main.PrimeId = a.PrimeId*
) mem
on main.PrimeId = mem.PrimeId
In this construction you don't need to use the 'is not null' clause as well (will you ever need that since a primarykey will never hold a null-value?).
I was taught to avoid OR-constructions (as is already adviced by others) but also to avoid 'is not null' or 'in valuelist' - construction. Those can mostly be replaced by an (NOT) EXISTS-clause.
This is not a direct answer, but if you have FK constraints referring from ATTRGROUP.PrimeId and ATTRGROUP.RelatedPrimeId to main, then your query is equivalent to this much simpler one:
select PrimeId from ATTRGROUP a
union
select RelatedPrimeId from ATTRGROUP a
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With