Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimize MySQL nested select with arithmetic operation

Tags:

sql

select

mysql

I have this sql query running on MySQL 5.1 non-normalized table. It works the way i want it to, but it can be quite slow. I added an index on the day column but it still needs to be faster. Any suggestions on how to get this faster? (maybe with a join instead?)

SELECT DISTINCT(bucket) AS b,
       (possible_free_slots -
          (SELECT COUNT(availability)
           FROM ip_bucket_list
           WHERE bucket = b
           AND availability = 'used'
           AND tday = 'evening'
           AND day LIKE '2012-12-14%'
           AND network = '10_83_mh1_bucket')) AS free_slots
FROM ip_bucket_list
ORDER BY free_slots DESC;

The individual queries are fast:

SELECT DISTINCT(bucket) FROM ip_bucket_list;
1024 rows in set (0.05 sec)

 SELECT COUNT(availability) from ip_bucket_list WHERE bucket = 0 AND availability = 'used' AND tday = 'evening' AND day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket';
1 row in set (0.00 sec)

Table:

mysql> describe ip_bucket_list;
+---------------------+--------------+------+-----+-------------------+----------------+
| Field               | Type         | Null | Key | Default           | Extra          |
+---------------------+--------------+------+-----+-------------------+----------------+
| id                  | int(11)      | NO   | PRI | NULL              | auto_increment |
| ip                  | varchar(50)  | YES  |     | NULL              |                |
| bucket              | int(11)      | NO   | MUL | NULL              |                |
| availability        | varchar(20)  | YES  |     | NULL              |                |
| network             | varchar(100) | NO   | MUL | NULL              |                |
| possible_free_slots | int(11)      | NO   |     | NULL              |                |
| tday                | varchar(20)  | YES  |     | NULL              |                |
| day                 | timestamp    | NO   | MUL | CURRENT_TIMESTAMP |                |
+---------------------+--------------+------+-----+-------------------+----------------+

and the DESC:

DESC SELECT DISTINCT(bucket) as b,(possible_free_slots - (SELECT COUNT(availability) from  ip_bucket_list WHERE bucket = b AND availability = 'used' AND tday = 'evening' AND day  LIKE '2012-12-14%' AND network = '10_83_mh1_bucket')) as free_slots FROM ip_bucket_list  ORDER BY free_slots DESC;
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
| id | select_type        | table          | type | possible_keys                           | key    | key_len | ref  | rows   | Extra                           |
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
|  1 | PRIMARY            | ip_bucket_list | ALL  | NULL                                    | NULL   | NULL    | NULL | 328354 | Using temporary; Using filesort |
|  2 | DEPENDENT SUBQUERY | ip_bucket_list | ref  | bucket,network,ip_bucket_list_day_index | bucket | 4       | func |    161 | Using where                     |
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
like image 285
Andrew Avatar asked Oct 22 '22 22:10

Andrew


1 Answers

I would move the correlated subquery from the SELECT clause into the FROM clause, using a join:

SELECT distinct bucket as b,
       (possible_free_slots - a.avail) as free_slots
FROM ip_bucket_list ipbl left outer join
     (SELECT bucket COUNT(availability) as avail
      from ip_bucket_list
      WHERE availability = 'used' AND tday = 'evening' AND
             day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket'
     ) on a
     on ipbl.bucket = avail.bucket
ORDER BY free_slots DESC;

The version in the SELECT clause is probably being re-run for every row (even before the distinct is running). By putting it in the from clause, the ip_bucket_list table will be scanned only once.

Also, if you are expecting each bucket to only show up once, then I would recommend that you use group by rather than distinct. It would clarify the purpose of the query. You may be able to eliminate the second reference to the table altogether, with something like:

SELECT bucket as b,
       max(possible_free_slots -
           (case when availability = 'used' AND tday = 'evening' AND
                      day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket'
                 then 1 else 0
            end)
           ) as free_slots
FROM ip_bucket_list
group by bucket
ORDER BY free_slots DESC;

To speed up your version of the query, you need an index on bucket, because this is used for the correlated subquery.

like image 164
Gordon Linoff Avatar answered Oct 27 '22 20:10

Gordon Linoff