Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to combine these two mysql queries to preserve the speed of it?

Tags:

mysql

I have two MySQL queries, which one by one, run very fast:

QUERY 1

SELECT Ads.AdId FROM Ads, AdsGeometry WHERE 
      AdsGeometry.AdId = Ads.AdId AND
      (ST_CONTAINS(GeomFromText('Polygon((
         -4.9783515930176 36.627100703563, 
         -5.0075340270996 36.61222072018, 
         -4.9896812438965 36.57638676015, 
         -4.965991973877 36.579419508882, 
         -4.955005645752 36.617732160006, 
         -4.9783515930176 36.627100703563
      ))'), AdsGeometry.GeomPoint)) 
GROUP BY Ads.AdId

This query runs in 0.0013 seconds, and returns 4 rows.

QUERY 2

 SELECT Ads.AdId FROM Ads, AdsHierarchy WHERE 
      Ads.AdId = AdsHierarchy.ads_AdId AND  
      AdsHierarchy.locations_LocationId = 148022797 
 GROUP BY Ads.AdId

This query runs in 0.0094 seconds, and returns 67 rows (3 of them are the same of the above query).

I am trying to merge these two queries into a single query, because later on, the result set of the two queries should be ordered together, and I would like to do the ordering using MySQL. This is what I tried, and below it, you will find it's explain also:

SELECT Ads.AdId FROM Ads, AdsHierarchy, AdsGeometry WHERE 
      Ads.AdId = AdsHierarchy.ads_AdId AND 
      AdsGeometry.AdId = Ads.AdId AND ( 
          ST_CONTAINS(GeomFromText('Polygon((
             -4.9783515930176 36.627100703563, 
             -5.0075340270996 36.61222072018, 
             -4.9896812438965 36.57638676015, 
             -4.965991973877 36.579419508882, 
             -4.955005645752 36.617732160006, 
             -4.9783515930176 36.627100703563
          ))'), AdsGeometry.GeomPoint) OR 
          AdsHierarchy.locations_LocationId = 148022797
      ) 
GROUP BY Ads.AdId

id  select_type     table           type                  possible_keys                                 key                key_len  ref                      rows       Extra
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1   SIMPLE          AdsGeometry     ALL                   PRIMARY,GeomPoint,sx_adsgeometry_geompoint    NULL               NULL     NULL                     682848     Using temporary; Using filesort
1   SIMPLE          Ads             eq_ref                PRIMARY                                       PRIMARY            4        dbname.AdsGeometry.AdId  1          Using where; Using index
1   SIMPLE          AdsHierarchy    ref                   Ads_AdsHierarchy,locations_LocationId         Ads_AdsHierarchy   4        dbname.Ads.AdId          1          Using where

While this query returns the correct result set (68 rows), it needs 6.5937 seconds to run. If I understand this correctly, the AdsHierarchy table is not using it's indexes, neither does the AdsGeometry one.

Is there any way to merge the two queries (or possible even more location, or polygon based queries like these) together, and maintain a reasonable amount of speed for it to run?

Thanks!

EDIT: Some information, about the indexes of the 3 tables in question

AdsGeometry table is MyISAM and primary key is AdId.

Result of SHOW INDEXES FROM AdsGeometry is:

Table           Non_unique  Key_name                    Seq_in_index    Column_name     Collation   Cardinality     Sub_part    Packed      Null    Index_type    Comment   Index_comment
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AdsGeometry     0           PRIMARY                     1               AdId            A           682848          NULL        NULL        BTREE       
AdsGeometry     1           Latitude                    1               Latitude        A           NULL            NULL        NULL        BTREE       
AdsGeometry     1           Longitude                   1               Longitude       A           NULL            NULL        NULL        BTREE       
AdsGeometry     1           GeomPoint                   1               GeomPoint       A           NULL            32          NULL        SPATIAL         
AdsGeometry     1           sx_adsgeometry_geompoint    1               GeomPoint       A           NULL            32          NULL        SPATIAL         
AdsGeometry     1           Latitude_2                  1               Latitude        A           NULL            NULL        NULL        BTREE       
AdsGeometry     1           Latitude_2                  2               Longitude       A           NULL            NULL        NULL        BTREE       

AdsHierarchy table type is InnoDB, primary key is AdsHierarchyId.

Result of SHOW INDEXES FROM AdsHierarchy is:

Table           Non_unique  Key_name                    Seq_in_index    Column_name           Collation     Cardinality     Sub_part    Packed      Null    Index_type    Comment   Index_comment
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AdsHierarchy    0           PRIMARY                     1               AdsHierarchyId        A             2479044         NULL        NULL                BTREE       
AdsHierarchy    1           Ads_AdsHierarchy            1               ads_AdId              A             2479044         NULL        NULL                BTREE       
AdsHierarchy    1           locations_LocationId        1               locations_LocationId  A             123952          NULL        NULL                BTREE   

Ads table type is InnoDB, primary key is AdId.

Result of SHOW INDEXES FROM Ads is:

Table           Non_unique  Key_name                    Seq_in_index    Column_name           Collation     Cardinality     Sub_part    Packed      Null    Index_type    Comment   Index_comment
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Ads             0           PRIMARY                     1               AdId                   A            705411          NULL        NULL                BTREE       
Ads             1           Accounts_Ads                1               accounts_AccountId     A            2               NULL        NULL                BTREE       
Ads             1           Ads_Locations               1               locations_LocationId   A            88176           NULL        NULL                BTREE       
Ads             1           Categories_Ads              1               categories_CategoryId  A            16              NULL        NULL                BTREE       
Ads             1           Currencies_Ads              1               currencies_Currency    A            2               NULL        NULL                BTREE       
Ads             1           countries_CountryId         1               countries_CountryId    A            204             NULL        NULL                BTREE       
Ads             1           ExternalId                  1               ExternalId             A            705411          NULL        NULL                BTREE       
Ads             1           ExternalId                  2               accounts_AccountId     A            705411          NULL        NULL                BTREE       
Ads             1           xml_XMLId                   1               xml_XMLId              A            4               NULL        NULL                BTREE       
Ads             1           streets_StreetId            1               streets_StreetId       A            2               NULL        NULL        YES     BTREE   

EDIT 2: The query rewritten with implicit joins, and explained:

This is the query, rewritten to use implicit joins, but it still runs very slowly (5.503 secs)

 SELECT a.AdId FROM Ads AS a 
   JOIN AdsHierarchy AS ah ON a.AdId = ah.ads_AdId   
   JOIN AdsGeometry AS ag ON a.AdId = ag.AdId 
   WHERE 
      ST_CONTAINS(GeomFromText('Polygon((
          -4.9783515930176 36.627100703563, 
          -5.0075340270996 36.61222072018, 
          -4.9896812438965 36.57638676015, 
          -4.965991973877 36.579419508882, 
          -4.955005645752 36.617732160006, 
          -4.9783515930176 36.627100703563
      ))'), ag.GeomPoint) 
      OR ah.locations_LocationId = 148022797
   GROUP BY a.AdId

id  select_type     table           type                  possible_keys                                 key                key_len  ref                      rows       Extra
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1   SIMPLE          a               index                 PRIMARY                                       PRIMARY            4        NULL                     627853     Using index
1   SIMPLE          ag              eq_ref                PRIMARY,GeomPoint,sx_adsgeometry_geompoint    PRIMARY            8        micasa_dev.a.AdId        1          Using index condition
1   SIMPLE          ah              ref                   Ads_AdsHierarchy,locations_LocationId         Ads_AdsHierarchy   4        micasa_dev.a.AdId        1          Using where

EDIT 3: Trying out UNION-ing the two queries

Also tried the UNION method, provided by @RobertKoch.

While the following UNION query runs very fast (0.06 seconds)

SELECT Ads.AdId FROM Ads, AdsGeometry 
WHERE 
    AdsGeometry.AdId = Ads.AdId AND 
    ST_CONTAINS(GeomFromText('Polygon(( 
         -4.9783515930176 36.627100703563, 
         -5.0075340270996 36.61222072018, 
         -4.9896812438965 36.57638676015, 
         -4.965991973877 36.579419508882, 
         -4.955005645752 36.617732160006, 
         -4.9783515930176 36.627100703563 
   ))'), AdsGeometry.GeomPoint) 
GROUP BY Ads.AdId 
UNION 
SELECT Ads.AdId FROM Ads, AdsHierarchy WHERE 
   Ads.AdId = AdsHierarchy.ads_AdId AND  
   AdsHierarchy.locations_LocationId = 148022797 
GROUP BY Ads.AdId

I still can't use this method, as later on I need to sort the result set which I get from the merging of the two queries, based on the Ads table.

And if I try to do the following thing, the query once again becomes extremly slow (3.7 seconds):

SELECT Ads.AdId FROM Ads WHERE Ads.AdId IN (
  SELECT Ads.AdId FROM Ads, AdsGeometry 
  WHERE 
      AdsGeometry.AdId = Ads.AdId AND 
      ST_CONTAINS(GeomFromText('Polygon(( 
         -4.9783515930176 36.627100703563, 
         -5.0075340270996 36.61222072018, 
         -4.9896812438965 36.57638676015, 
         -4.965991973877 36.579419508882, 
         -4.955005645752 36.617732160006, 
         -4.9783515930176 36.627100703563 
      ))'), AdsGeometry.GeomPoint) 
  GROUP BY Ads.AdId 
  UNION 
  SELECT Ads.AdId FROM Ads, AdsHierarchy WHERE 
     Ads.AdId = AdsHierarchy.ads_AdId AND  
     AdsHierarchy.locations_LocationId = 148022797 
  GROUP BY Ads.AdId
) WHERE Ads.AdId > 100000
ORDER BY Ads.ModifiedDate ASC

EDIT 4: Changing where the UNION resides, seems to solve the problem

If I modify the above UNION query to

SELECT Ads.AdId
FROM Ads,
(SELECT Ads.AdId   
    FROM Ads,
    AdsGeometry
   WHERE AdsGeometry.AdId = Ads.AdId
     AND ST_CONTAINS(GeomFromText('Polygon(( 
         -4.9783515930176 36.627100703563, 
         -5.0075340270996 36.61222072018, 
         -4.9896812438965 36.57638676015, 
         -4.965991973877 36.579419508882, 
         -4.955005645752 36.617732160006, 
         -4.9783515930176 36.627100703563 
      ))'), AdsGeometry.GeomPoint)
   GROUP BY Ads.AdId
   UNION SELECT Ads.AdId
   FROM Ads,
        AdsHierarchy
   WHERE Ads.AdId = AdsHierarchy.ads_AdId
     AND AdsHierarchy.locations_LocationId = 148022797
   GROUP BY Ads.AdId) AS nt
WHERE Ads.AdId = nt.AdId
  AND Ads.AdId > 1000000
ORDER BY Ads.ModifiedDate ASC

the query then runs fast again (~0.0007 seconds).

If no solution comes without UNION, I am willing to give the bounty to anyone who can explain the difference between the two UNION versions (this one, and the one in EDIT 3), and explain to me, why the query runs fast when it is written in the following order, and runs slow, when written in the above order.

If any other informations are needed, please ask in the comments, and I try to provide them! Thanks

*NOTE: * I've added an ORDER to the two UNION queries, to make it clearer, that while I am only selecting the AdId from the tables, I still need the other fields from the Ads table.

EDIT 5: request of @bovko

1   SIMPLE  Ads     index   NULL                countries_CountryId     2   NULL                    627853  Using index; Using temporary
1   SIMPLE  ag      eq_ref  PRIMARY             PRIMARY                 8   micasa_dev.Ads.AdId     1       Using where; Distinct  
1   SIMPLE  ah      ref     Ads_AdsHierarchy    Ads_AdsHierarchy        4   micasa_dev.Ads.AdId     1       Using where; Distinct
like image 490
Adam Baranyai Avatar asked Sep 09 '16 12:09

Adam Baranyai


People also ask

How do I fix slow queries in MySQL?

The following is the best process for collecting and aggregating the top queries: Set long_query_time = 0 (in some cases, you may need to rate limit to not flood the log) Enable the slow log and collect for some time (slow_query_log = 1) Stop collection and process the log with pt-query-digest.

Why is MySQL Query taking so long?

There are a number of things that may cause a query to take longer time to execute: Inefficient query – Use non-indexed columns while lookup or joining, thus MySQL takes longer time to match the condition. Table lock – The table is locked, by global lock or explicit table lock when the query is trying to access it.

How many queries can MySQL handle?

MySQL can run more than 50,000 simple queries per second on commodity server hardware and over 2,000 queries per second from a single correspondent on a Gigabit network, so running multiple queries isn't necessarily such a bad thing.


2 Answers

IN ( SELECT ... ) is usually vary inefficient. Avoid it.

All the answers so far are working harder than they need to. It seems like the JOINs are unnecessary until after the UNION. See more Notes below.

SELECT  Ads.AdId
    FROM  Ads, 
    JOIN (
        ( SELECT  AdId
            FROM  AdsGeometry
            WHERE  ST_CONTAINS(GeomFromText('Polygon(( -4.9783515930176 36.627100703563,
                      -5.0075340270996 36.61222072018, -4.9896812438965 36.57638676015,
                      -4.965991973877 36.579419508882, -4.955005645752 36.617732160006,
                      -4.9783515930176 36.627100703563 ))'),
                              AdsGeometry.GeomPoint)
              AND AdId > 1000000 )
         UNION DISTINCT
         ( SELECT  ads_AdId AS AdId
            FROM  AdsHierarchy
            WHERE  locations_LocationId = 148022797
              AND  ads_AdId > 1000000 )
          ) AS nt ON Ads.AdId = nt.AdId
    ORDER BY  Ads.ModifiedDate ASC

Notes:

  • Both AdsGeometry and AdsHierarchy have adId (by different names); there is no need to do the JOIN in the inner queries, except possibly to verify that it exists in Ads. Is that an issue? My query will take care of in the outer SELECT's JOIN, anyway.
  • UNION DISTINCT is needed because the two SELECTs may fetch the same ids.
  • Move the > 1000000 inside to cut down on the number of values gathered by the UNION.
  • The UNION will always (in older versions of MySQL) or sometimes (in newer versions) create a temp table. You are stuck with that.
  • IN ( SELECT ... ) usually optimizes terribly; avoid it.
  • I added some parentheses; It is possible (but not necessary at the moment) to add ORDER BY, etc, to the UNION; the parens make it clear what it would belong to.
  • The only reason for the outer query is to get ModifiedDate for ordering. You could speed things up by removing that requirement. (The UNION probably creates a tmp table; this ORDER BY probably creates another.)
like image 63
Rick James Avatar answered Oct 05 '22 08:10

Rick James


Both your UNIONqueries does extra work by searching for the results already found by doing

SELECT Ads.AdId FROM Ads WHERE AdId IN ...
or
SELECT Ads.AdId FROM Ads, (SELECT Ads.AdId ...) AS nt WHERE Ads.AdId = nt.AdId

Also SELECT Ads.AdId FROM Ads, ... GROUP BY Ads will probably be more efficient, as well as easier to understand if written as SELECT DISTINCT Ads.AdID FROM Ads, ...

Thus, this should give a better query:

SELECT DISTINCT AdId FROM
  (SELECT Ads.AdId FROM Ads
   INNER JOIN AdsGeometry ON AdsGeometry.AdId = Ads.AdId 
   WHERE 
      ST_CONTAINS(GeomFromText('Polygon(( 
         -4.9783515930176 36.627100703563, 
         -5.0075340270996 36.61222072018, 
         -4.9896812438965 36.57638676015, 
         -4.965991973877 36.579419508882, 
         -4.955005645752 36.617732160006, 
         -4.9783515930176 36.627100703563 
      ))'), AdsGeometry.GeomPoint) 
  UNION ALL
  SELECT Ads.AdId FROM Ads
  INNER JOIN AdsHierarchy ON Ads.AdId = AdsHierarchy.ads_AdId
  WHERE AdsHierarchy.locations_LocationId = 148022797) AS sub 
WHERE AdId > 100000
like image 26
Terje D. Avatar answered Oct 05 '22 08:10

Terje D.