Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query performance; not sure what's happening

I had issues with my query that took 17 seconds to execute (350k rows):

SELECT idgps_unit, MAX(dt) 
         FROM gps_unit_location
        GROUP BY 1

Explain

1   SIMPLE  gps_unit_location   index       fk_gps2 5       422633  

After playing with it, I came with this solution that takes 1second:

Select idgps_unit, MAX(dt) from (
SELECT idgps_unit,  dt
         FROM gps_unit_location
) d1
Group by 1

Explain:

1   PRIMARY <derived2>  ALL                 423344  Using temporary; Using filesort
2   DERIVED gps_unit_location   index       gps_unit_location_dt_gpsid  10      422617  Using index

And now I am confused- why query #2 is fast, while query #1 seems to be the same query and seems to be written more efficiently.

Index1 :DT, Index2: idgps_unit, Index3: idgps_unit+DT

The execution times are consistent; query #1 always takes 17-19sec; while #1 <1sec.

I am using Godaddy VPS Windows Server 2008 Economy

Table example:

id | idgps_unit | dt | location
1 | 1 | 2012-01-01 | 1
2 | 1 | 2012-01-02 | 2
3 | 2 | 2012-01-03 | 3
4 | 2 | 2012-01-04 | 4
5 | 3 | 2012-01-05 | 5
like image 963
Andrew Avatar asked Nov 04 '22 04:11

Andrew


1 Answers

First, I'm assuming gps_unit_location is really a table and not a view. Second, I'm also assuming that you have run both queries multiple times, so caching is the not explanation. (Caching would be that you run the first query, it loads the table into page cache and the second reads from memory rather than disk.)

Do you have an index on gps_unit_location(idgps_unit)? Are the records very wide? If the answers to these questions are "yes", then the following may be happening.

If so, you might have a curious problem with indexing. You would think that an index would speed up such a query. What it does, though, is to look up the values in idgps_id in order. If the index does not contain the date, then the database needs to fetch the data from each page. If the table does not fit into memory, then this will often result in a cache-miss -- that is, time to load the page.

By contrast, if the table is wide and the engine does a full table scan, then it can zip through the table and extract the two fields of interest. It puts them on the side. If they are small relative to the full table, then sorting them might take very little time. Voila, the query finishes faster.

My guess would be that the second structure removes the use of an index.

By the way, you can fix this by changing the index to gps_unit_location(idgps_unit, dt). By including the field in the index, the query does not have to load the data.

like image 147
Gordon Linoff Avatar answered Nov 09 '22 16:11

Gordon Linoff