I have following tables:
CREATE TABLE IF NOT EXISTS stats (
date date NOT NULL DEFAULT '0000-00-00',
cid int(8) NOT NULL DEFAULT '0',
v bigint(15) NOT NULL DEFAULT '0',
c bigint(15) NOT NULL DEFAULT '0',
a bigint(15) NOT NULL DEFAULT '0',
PRIMARY KEY (date,cid),
KEY date (date),
KEY cid (cid),
KEY date_cid_vca (date,cid,v,c,a)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
this table has 30,842,712 rows
and
CREATE TABLE IF NOT EXISTS camp (
id int(8) NOT NULL AUTO_INCREMENT,
name varchar(80) NOT NULL DEFAULT '',
PRIMARY KEY (id,name)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
this table has 1985 rows
I have the following query:
SELECT
c.id,
c.name,
SUM(s.v) AS sumv,
SUM(s.c) AS sumc,
GREATEST(((SUM(s.c)/SUM(s.v))*100.00), 0.00) AS cratio,
SUM(s.a) AS suma,
GREATEST(((SUM(s.a)/SUM(s.c))*100.00), 0.00) AS aratio
FROM
stats s, camp c
WHERE
s.date >= '2012-02-01' AND
s.date <= '2012-02-29' AND
c.id=s.cid
GROUP BY s.cid;
EXPLAIN shows:
+----+-------------+-------+-------+-------------------------------+--------------+---------+---------------------+---------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------------+--------------+---------+---------------------+---------+-----------------------------------------------------------+
| 1 | SIMPLE | s | range | PRIMARY,date,cid,date_cid_vca | date_cid_vca | 3 | NULL | 1010265 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | c | ref | PRIMARY | PRIMARY | 4 | db.s.cid | 1 | Using index |
+----+-------------+-------+-------+-------------------------------+--------------+---------+---------------------+---------+-----------------------------------------------------------+
Problem is that the query takes around 50 seconds to complete even though it uses indexes. Is there any other way to optimize the query?
Thanks!
You've done well in optimizing this query with your indexes. I suppose you really do have over 1 million rows in stats
in that date range. Unfortunately, joining (and then grouping) 1 million rows, even with a covering index, is a lot to ask from a database. For better performance, you will need to beef up the hardware, start denormalizing (put camp
inside of stats
to avoid the join), or keep running totals for each camp instead of calculating it on the fly.
Edit
Since removing the 1 million+ joins seemed to have made a large impact, you can try something like this:
SELECT c.*, a.* FROM
(SELECT
SUM(s.v) AS sumv,
SUM(s.c) AS sumc,
GREATEST(((SUM(s.c)/SUM(s.v))*100.00), 0.00) AS cratio,
SUM(s.a) AS suma,
GREATEST(((SUM(s.a)/SUM(s.c))*100.00), 0.00) AS aratio,
s.cid
FROM
stats s
WHERE
s.date >= '2012-02-01'
AND s.date <= '2012-02-29'
GROUP BY s.cid) a
JOIN
camp c
ON c.id = a.cid
This query does the join on the smaller result set.
The following query should allow it to use indexes more efficiently
SELECT
c.id,
c.name,
SUM(s.v) AS sumv,
SUM(s.c) AS sumc,
GREATEST(((SUM(s.c)/SUM(s.v))*100.00), 0.00) AS cratio,
SUM(s.a) AS suma,
GREATEST(((SUM(s.a)/SUM(s.c))*100.00), 0.00) AS aratio
FROM
camp c
INNER JOIN
stats s
ON
s.cid = c.id
AND s.date BETWEEN '2012-02-01' AND '2012-02-29'
GROUP BY c.id;
Also I would consider deleting the date_cid_vca key as it is just containing the entire table and so not particularly useful. The above query should use the PK to match rows from stats to camp based on date and cid and so while it'shard to be 100% certain without access to your database, I'm fairly confident that the above will improve your response times
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With