I have two MySQL queries, which one by one, run very fast:
QUERY 1
SELECT Ads.AdId FROM Ads, AdsGeometry WHERE
AdsGeometry.AdId = Ads.AdId AND
(ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), AdsGeometry.GeomPoint))
GROUP BY Ads.AdId
This query runs in 0.0013 seconds, and returns 4 rows.
QUERY 2
SELECT Ads.AdId FROM Ads, AdsHierarchy WHERE
Ads.AdId = AdsHierarchy.ads_AdId AND
AdsHierarchy.locations_LocationId = 148022797
GROUP BY Ads.AdId
This query runs in 0.0094 seconds, and returns 67 rows (3 of them are the same of the above query).
I am trying to merge these two queries into a single query, because later on, the result set of the two queries should be ordered together, and I would like to do the ordering using MySQL. This is what I tried, and below it, you will find it's explain also:
SELECT Ads.AdId FROM Ads, AdsHierarchy, AdsGeometry WHERE
Ads.AdId = AdsHierarchy.ads_AdId AND
AdsGeometry.AdId = Ads.AdId AND (
ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), AdsGeometry.GeomPoint) OR
AdsHierarchy.locations_LocationId = 148022797
)
GROUP BY Ads.AdId
id select_type table type possible_keys key key_len ref rows Extra
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 SIMPLE AdsGeometry ALL PRIMARY,GeomPoint,sx_adsgeometry_geompoint NULL NULL NULL 682848 Using temporary; Using filesort
1 SIMPLE Ads eq_ref PRIMARY PRIMARY 4 dbname.AdsGeometry.AdId 1 Using where; Using index
1 SIMPLE AdsHierarchy ref Ads_AdsHierarchy,locations_LocationId Ads_AdsHierarchy 4 dbname.Ads.AdId 1 Using where
While this query returns the correct result set (68 rows), it needs 6.5937 seconds to run. If I understand this correctly, the AdsHierarchy
table is not using it's indexes, neither does the AdsGeometry
one.
Is there any way to merge the two queries (or possible even more location, or polygon based queries like these) together, and maintain a reasonable amount of speed for it to run?
Thanks!
EDIT: Some information, about the indexes of the 3 tables in question
AdsGeometry
table is MyISAM and primary key is AdId
.
Result of SHOW INDEXES FROM AdsGeometry
is:
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AdsGeometry 0 PRIMARY 1 AdId A 682848 NULL NULL BTREE
AdsGeometry 1 Latitude 1 Latitude A NULL NULL NULL BTREE
AdsGeometry 1 Longitude 1 Longitude A NULL NULL NULL BTREE
AdsGeometry 1 GeomPoint 1 GeomPoint A NULL 32 NULL SPATIAL
AdsGeometry 1 sx_adsgeometry_geompoint 1 GeomPoint A NULL 32 NULL SPATIAL
AdsGeometry 1 Latitude_2 1 Latitude A NULL NULL NULL BTREE
AdsGeometry 1 Latitude_2 2 Longitude A NULL NULL NULL BTREE
AdsHierarchy
table type is InnoDB, primary key is AdsHierarchyId
.
Result of SHOW INDEXES FROM AdsHierarchy
is:
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AdsHierarchy 0 PRIMARY 1 AdsHierarchyId A 2479044 NULL NULL BTREE
AdsHierarchy 1 Ads_AdsHierarchy 1 ads_AdId A 2479044 NULL NULL BTREE
AdsHierarchy 1 locations_LocationId 1 locations_LocationId A 123952 NULL NULL BTREE
Ads
table type is InnoDB, primary key is AdId
.
Result of SHOW INDEXES FROM Ads
is:
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Ads 0 PRIMARY 1 AdId A 705411 NULL NULL BTREE
Ads 1 Accounts_Ads 1 accounts_AccountId A 2 NULL NULL BTREE
Ads 1 Ads_Locations 1 locations_LocationId A 88176 NULL NULL BTREE
Ads 1 Categories_Ads 1 categories_CategoryId A 16 NULL NULL BTREE
Ads 1 Currencies_Ads 1 currencies_Currency A 2 NULL NULL BTREE
Ads 1 countries_CountryId 1 countries_CountryId A 204 NULL NULL BTREE
Ads 1 ExternalId 1 ExternalId A 705411 NULL NULL BTREE
Ads 1 ExternalId 2 accounts_AccountId A 705411 NULL NULL BTREE
Ads 1 xml_XMLId 1 xml_XMLId A 4 NULL NULL BTREE
Ads 1 streets_StreetId 1 streets_StreetId A 2 NULL NULL YES BTREE
EDIT 2: The query rewritten with implicit joins, and explained:
This is the query, rewritten to use implicit joins, but it still runs very slowly (5.503 secs)
SELECT a.AdId FROM Ads AS a
JOIN AdsHierarchy AS ah ON a.AdId = ah.ads_AdId
JOIN AdsGeometry AS ag ON a.AdId = ag.AdId
WHERE
ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), ag.GeomPoint)
OR ah.locations_LocationId = 148022797
GROUP BY a.AdId
id select_type table type possible_keys key key_len ref rows Extra
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 SIMPLE a index PRIMARY PRIMARY 4 NULL 627853 Using index
1 SIMPLE ag eq_ref PRIMARY,GeomPoint,sx_adsgeometry_geompoint PRIMARY 8 micasa_dev.a.AdId 1 Using index condition
1 SIMPLE ah ref Ads_AdsHierarchy,locations_LocationId Ads_AdsHierarchy 4 micasa_dev.a.AdId 1 Using where
EDIT 3: Trying out UNION-ing the two queries
Also tried the UNION
method, provided by @RobertKoch.
While the following UNION
query runs very fast (0.06 seconds)
SELECT Ads.AdId FROM Ads, AdsGeometry
WHERE
AdsGeometry.AdId = Ads.AdId AND
ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), AdsGeometry.GeomPoint)
GROUP BY Ads.AdId
UNION
SELECT Ads.AdId FROM Ads, AdsHierarchy WHERE
Ads.AdId = AdsHierarchy.ads_AdId AND
AdsHierarchy.locations_LocationId = 148022797
GROUP BY Ads.AdId
I still can't use this method, as later on I need to sort the result set which I get from the merging of the two queries, based on the Ads
table.
And if I try to do the following thing, the query once again becomes extremly slow (3.7 seconds):
SELECT Ads.AdId FROM Ads WHERE Ads.AdId IN (
SELECT Ads.AdId FROM Ads, AdsGeometry
WHERE
AdsGeometry.AdId = Ads.AdId AND
ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), AdsGeometry.GeomPoint)
GROUP BY Ads.AdId
UNION
SELECT Ads.AdId FROM Ads, AdsHierarchy WHERE
Ads.AdId = AdsHierarchy.ads_AdId AND
AdsHierarchy.locations_LocationId = 148022797
GROUP BY Ads.AdId
) WHERE Ads.AdId > 100000
ORDER BY Ads.ModifiedDate ASC
EDIT 4: Changing where the UNION resides, seems to solve the problem
If I modify the above UNION
query to
SELECT Ads.AdId
FROM Ads,
(SELECT Ads.AdId
FROM Ads,
AdsGeometry
WHERE AdsGeometry.AdId = Ads.AdId
AND ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), AdsGeometry.GeomPoint)
GROUP BY Ads.AdId
UNION SELECT Ads.AdId
FROM Ads,
AdsHierarchy
WHERE Ads.AdId = AdsHierarchy.ads_AdId
AND AdsHierarchy.locations_LocationId = 148022797
GROUP BY Ads.AdId) AS nt
WHERE Ads.AdId = nt.AdId
AND Ads.AdId > 1000000
ORDER BY Ads.ModifiedDate ASC
the query then runs fast again (~0.0007 seconds).
If no solution comes without
UNION
, I am willing to give the bounty to anyone who can explain the difference between the twoUNION
versions (this one, and the one in EDIT 3), and explain to me, why the query runs fast when it is written in the following order, and runs slow, when written in the above order.
If any other informations are needed, please ask in the comments, and I try to provide them! Thanks
*NOTE: * I've added an ORDER to the two UNION queries, to make it clearer, that while I am only selecting the AdId
from the tables, I still need the other fields from the Ads
table.
EDIT 5: request of @bovko
1 SIMPLE Ads index NULL countries_CountryId 2 NULL 627853 Using index; Using temporary
1 SIMPLE ag eq_ref PRIMARY PRIMARY 8 micasa_dev.Ads.AdId 1 Using where; Distinct
1 SIMPLE ah ref Ads_AdsHierarchy Ads_AdsHierarchy 4 micasa_dev.Ads.AdId 1 Using where; Distinct
The following is the best process for collecting and aggregating the top queries: Set long_query_time = 0 (in some cases, you may need to rate limit to not flood the log) Enable the slow log and collect for some time (slow_query_log = 1) Stop collection and process the log with pt-query-digest.
There are a number of things that may cause a query to take longer time to execute: Inefficient query – Use non-indexed columns while lookup or joining, thus MySQL takes longer time to match the condition. Table lock – The table is locked, by global lock or explicit table lock when the query is trying to access it.
MySQL can run more than 50,000 simple queries per second on commodity server hardware and over 2,000 queries per second from a single correspondent on a Gigabit network, so running multiple queries isn't necessarily such a bad thing.
IN ( SELECT ... )
is usually vary inefficient. Avoid it.
All the answers so far are working harder than they need to. It seems like the JOINs
are unnecessary until after the UNION
. See more Notes below.
SELECT Ads.AdId
FROM Ads,
JOIN (
( SELECT AdId
FROM AdsGeometry
WHERE ST_CONTAINS(GeomFromText('Polygon(( -4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018, -4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882, -4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563 ))'),
AdsGeometry.GeomPoint)
AND AdId > 1000000 )
UNION DISTINCT
( SELECT ads_AdId AS AdId
FROM AdsHierarchy
WHERE locations_LocationId = 148022797
AND ads_AdId > 1000000 )
) AS nt ON Ads.AdId = nt.AdId
ORDER BY Ads.ModifiedDate ASC
Notes:
AdsGeometry
and AdsHierarchy
have adId (by different names); there is no need to do the JOIN
in the inner queries, except possibly to verify that it exists in Ads
. Is that an issue? My query will take care of in the outer SELECT's JOIN
, anyway.UNION DISTINCT
is needed because the two SELECTs
may fetch the same ids.> 1000000
inside to cut down on the number of values gathered by the UNION
.UNION
will always (in older versions of MySQL) or sometimes (in newer versions) create a temp table. You are stuck with that.IN ( SELECT ... )
usually optimizes terribly; avoid it.ORDER BY
, etc, to the UNION
; the parens make it clear what it would belong to.ModifiedDate
for ordering. You could speed things up by removing that requirement. (The UNION
probably creates a tmp table; this ORDER BY
probably creates another.)Both your UNION
queries does extra work by searching for the results already found by doing
SELECT Ads.AdId FROM Ads WHERE AdId IN ...
orSELECT Ads.AdId FROM Ads, (SELECT Ads.AdId ...) AS nt WHERE Ads.AdId = nt.AdId
Also SELECT Ads.AdId FROM Ads, ... GROUP BY Ads
will probably be more efficient, as well as easier to understand if written as SELECT DISTINCT Ads.AdID FROM Ads, ...
Thus, this should give a better query:
SELECT DISTINCT AdId FROM
(SELECT Ads.AdId FROM Ads
INNER JOIN AdsGeometry ON AdsGeometry.AdId = Ads.AdId
WHERE
ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), AdsGeometry.GeomPoint)
UNION ALL
SELECT Ads.AdId FROM Ads
INNER JOIN AdsHierarchy ON Ads.AdId = AdsHierarchy.ads_AdId
WHERE AdsHierarchy.locations_LocationId = 148022797) AS sub
WHERE AdId > 100000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With