When I add an ORDER BY statement to my query it becomes very slow.
Here's my query without ORDER BY:
SELECT ClientIpAddress, Agentstring, Count(ClientIpAddress) AS Count FROM LogEntries
WHERE SiteIisId = 3 AND DateTime >= '13-09-2012 00:00:00'
GROUP BY ClientIpAddress, Agentstring
LIMIT 5
ET: 1ms
And now with ORDER BY:
SELECT ClientIpAddress, Agentstring, Count(ClientIpAddress) AS Count FROM LogEntries
WHERE SiteIisId = 3 AND DateTime >= '13-09-2012 00:00:00'
GROUP BY ClientIpAddress, Agentstring
ORDER BY Count DESC
LIMIT 5
ET: 294 ms
The table I'm querying contains 1.380.855 rows.
Here's the index I'm using:
CREATE INDEX "LogEntries_MostActiveClients" ON "LogEntries" ("ClientIpAddress" ASC, "Agentstring" ASC, "SiteIisId" ASC, "DateTime" DESC)
Using EXPLAIN QUERY PLAN Sqlite tells me it's scanning the table using my index and is using TEMB B-TREEfor my Order By.
How can I overcome this problem? Obviously I can't index the Count, so what to do?
Thanks a million!
When you are stepping through a result set, SQLite tries to compute as many values as possible on the fly.
So, in your first query, SQLite never needs to group all the address/agent values in the table; as soon as it has read the records of the first five ClientIpAddress/Agentstring combinations through some index, it can stop.
In your second query, this is not possible: all address/agent groups have to be completely computed before they can be sorted and the first five ones chosen.
The records in the temporary result to be sorted are already in the cache, and are smaller than the data in the original table, so I'd guess that most of the time is not spent sorting, but grouping.
If the sorting were the problem, and if you had an estimate of how large the five largest counts would be, you could try adding a HAVING "Count" >= some_limit clause to reduce the number of records to be sorted.
There's nothing you can do to avoid the grouping. All you can try is to get small improvements with generic optimizations, such as:
Another way would be to precompute the values for this query: have a separate table with your Count, and update it whenever you add a log entry. This will make these updates slower, and you'll have to determine what granularity to use for timestamps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With