I have to JOIN to large tables in a MySQL query and it takes really long - approximately 180 seconds. Are there any tips for optimizing a merge?
My table has 10 fields. I am only using 4 in the query - all strings. Table has about 600,000 rows and the result should have about 50 rows.
The four used rows are: Title, Variables, Location, Date
Here is my query:
SELECT DISTINCT t1.Title, t1.Variables FROM `MyTABLE` t1 JOIN `MyTABLE` t2
USING (Title, Variables)
WHERE (t1.Location, t1.Date) = ('Location1', 'Date1')
AND (t2.Location, t2.Date) = ('Location2', 'Date2')
Like others pointed out, you need proper indexes. For this particular query, you can benefit from indexes like:
(Location, Date
) or (Date, Location
) (for the WHERE
clause)
and
(Title, Variables
) or (Variables, Title
) (for the join
condition, ON
clause)
It would be helpful to know exactly the size (that is, datatype) of the location, Date, Title, and Variables columns, as a large index is likely to be slower than a small one.
Finally, just a tip: I would not use fancy comparison constructs like you do. The
USING (Title, Variables)
is probably ok, but I would certainly check if
(t1.Location, t1.Date) = ('Location1', 'Date1')
and
(t2.Location, t2.Forecast_date) = ('Location2', 'Date2')
are behaving like you expect. SO I would definitely run EXPLAIN
on it, and compare the output with a "regular" old fashioned comparison, like so:
t1.Location = 'Location1'
AND t1.Date = 'Date1'
AND t2.Location = 'Location2'
AND t2.Forecast_date = 'Date2'
You may argue that logically, it is the same and it shouldn't matter - you'd be right. But then again, MySQL's optimizer isn't very smart, and there is always a possibility of bugs, especially with features that aren't used a lot. I think this is such a feature. So i would at least try to EXPLAIN and see if these alternate notations are evaluated the same.
But what BenoKrapo pointed out, would it not be easier to do something like this:
SELECT Title, Variables
FROM MyTABLE
WHERE Location = 'Location1' AND Date = 'Date1'
OR Location = 'Location2' AND Date = 'Date2'
GROUP BY Title, Variables
HAVING COUNT(*) >= 2
EDIT: I changed HAVING COUNT(*) = 2
to HAVING COUNT(*) >= 2
. See comments (thanks again, BenoKrapo)
EDIT: days after posting this answer, I found this post from Mark Callaghan, MySQL Architect for Facebook: http://www.facebook.com/note.php?note_id=243134480932 Essentially, he describes how similar-but-different 'smart' comparisons deliver abysmal performance due to MySQL optimizer bug. So my point is, try to unfancy your syntax when you suffer, you might have hit a bug.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With