I have a very large (80+ million row) de-normalized MySQL table. A simplified schema looks like:
+-----------+-------------+--------------+--------------+ | ID | PARAM1 | PARAM2 | PARAM3 | +-----------+-------------+--------------+--------------+ | 1 | .04 | .87 | .78 | +-----------+-------------+--------------+--------------+ | 2 | .12 | .02 | .76 | +-----------+-------------+--------------+--------------+ | 3 | .24 | .92 | .23 | +-----------+-------------+--------------+--------------+ | 4 | .65 | .12 | .01 | +-----------+-------------+--------------+--------------+ | 5 | .98 | .45 | .65 | +-----------+-------------+--------------+--------------+
I'm trying to see if there's a way to optimize a query in which I apply a weight to each PARAM column (where weight is between 0 and 1) and then average them to come up with a computed value SCORE. Then I want to ORDER BY that computed SCORE column.
For example, assuming the weighting for PARAM1 is .5, the weighting for PARAM2 is .23 and the weighting for PARAM3 is .76, you would end up with something similar to:
SELECT ID, ((PARAM1 * .5) + (PARAM2 * .23) + (PARAM3 * .76)) / 3 AS SCORE
ORDER BY SCORE DESC LIMIT 10
With some proper indexing, this is fast for basic queries, but I can't figure out a good way to speed up the above query on such a large table.
Details:
--EDIT--
A simplified version of the problem follows.
This runs in a reasonable amount of time:
SELECT value1, value2
FROM sometable
WHERE id = 1
ORDER BY value2
This does not run in a reasonable amount of time:
SELECT value1, (value2 * an_arbitrary_float) as value3
FROM sometable
WHERE id = 1
ORDER BY value3
Using the above example, is there any solution that allows me to do an ORDER BY with out computing value3 ahead of time?
Remove any unnecessary indexes on the table, paying particular attention to UNIQUE indexes as these disable change buffering. Don't use a UNIQUE index unless you need it; instead, employ a regular INDEX. Take a look at your slow query log every week or two. Pick the slowest three queries and optimize those.
Yes, column order does matter.
I've found 2 (sort of obvious) things that have helped speed this query up to a satisfactory level:
Minimize the number of rows that need to be sorted. By using an index on the 'id' field and a subselect to trim the number of records first, the file sort on the computed column is not that bad. Ie:
SELECT t.value1, (t.value2 * an_arbitrary_float) as SCORE
FROM (SELECT * FROM sometable WHERE id = 1) AS t
ORDER BY SCORE DESC
Try increasing sort_buffer_size in my.conf to speed up those filesorts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With