I had issues with my query that took 17 seconds to execute (350k rows):
SELECT idgps_unit, MAX(dt)
FROM gps_unit_location
GROUP BY 1
Explain
1 SIMPLE gps_unit_location index fk_gps2 5 422633
After playing with it, I came with this solution that takes 1second:
Select idgps_unit, MAX(dt) from (
SELECT idgps_unit, dt
FROM gps_unit_location
) d1
Group by 1
Explain:
1 PRIMARY <derived2> ALL 423344 Using temporary; Using filesort
2 DERIVED gps_unit_location index gps_unit_location_dt_gpsid 10 422617 Using index
And now I am confused- why query #2 is fast, while query #1 seems to be the same query and seems to be written more efficiently.
Index1 :DT, Index2: idgps_unit, Index3: idgps_unit+DT
The execution times are consistent; query #1 always takes 17-19sec; while #1 <1sec.
I am using Godaddy VPS Windows Server 2008 Economy
Table example:
id | idgps_unit | dt | location
1 | 1 | 2012-01-01 | 1
2 | 1 | 2012-01-02 | 2
3 | 2 | 2012-01-03 | 3
4 | 2 | 2012-01-04 | 4
5 | 3 | 2012-01-05 | 5
First, I'm assuming gps_unit_location
is really a table and not a view. Second, I'm also assuming that you have run both queries multiple times, so caching is the not explanation. (Caching would be that you run the first query, it loads the table into page cache and the second reads from memory rather than disk.)
Do you have an index on gps_unit_location(idgps_unit)
? Are the records very wide? If the answers to these questions are "yes", then the following may be happening.
If so, you might have a curious problem with indexing. You would think that an index would speed up such a query. What it does, though, is to look up the values in idgps_id
in order. If the index does not contain the date, then the database needs to fetch the data from each page. If the table does not fit into memory, then this will often result in a cache-miss -- that is, time to load the page.
By contrast, if the table is wide and the engine does a full table scan, then it can zip through the table and extract the two fields of interest. It puts them on the side. If they are small relative to the full table, then sorting them might take very little time. Voila, the query finishes faster.
My guess would be that the second structure removes the use of an index.
By the way, you can fix this by changing the index to gps_unit_location(idgps_unit, dt)
. By including the field in the index, the query does not have to load the data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With