Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do fast counting on large tables?

I have large MySQL tables with hundreds of thousands of rows.

I need to write a query on a customers table which gets the count of when customers will be available to contact again.

eg.

SELECT 'This week', COUNT(*) FROM customers 
WHERE sales_person_id = 1 AND DATEDIFF(NOW(), available_date) < 7

UNION

SELECT 'Next week', COUNT(*) FROM customers 
WHERE sales_person_id = 1 AND DATEDIFF(NOW(), available_date) >= 7 
    AND DATEDIFF(NOW(), available_date) < 14

UNION

... (a few more like this)

Having written a similar query on a different large table, I did notice that changing the engine from InnoDB to MyISAM sped up the query considerably (InnoDB is not needed for these tables as they do not have foreign key checks). Is there anything else I can do to speed up counts like this (other than indexing the appropriate fields)?

like image 838
Matt McCormick Avatar asked Jan 22 '23 08:01

Matt McCormick


2 Answers

WHERE sales_person_id = 1 AND available_date BETWEEN CURDATE() - INTERVAL 1 WEEK AND CURDATE()

Doing it this way should let MySQL use a composite index created on (sales_person_id, available_date) columns (use EXPLAIN to check)

like image 69
Mchl Avatar answered Jan 23 '23 22:01

Mchl


  1. Never do in multiple queries, what you can do in one.

    If you create a derived table/inline view that has the necessary start & end dates, this can be used to produce the desired result in a single query using a GROUP BY. MySQL doesn't have a recursive function, so you have to use the NUMBERS table trick to generate dates...

    1. Create a table that only holds incrementing numbers - easy to do using an auto_increment:

      DROP TABLE IF EXISTS `example`.`numbers`;
      CREATE TABLE  `example`.`numbers` (
       `id` int(10) unsigned NOT NULL auto_increment,
        PRIMARY KEY  (`id`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
      
    2. Populate the table using:

      INSERT INTO NUMBERS (id)
      VALUES (NULL)
      

      ...for as many values as you need.

    3. Use DATE_ADD to construct a list of dates, increasing the days based on the NUMBERS.id value.

      SELECT x.start_dt,
             x.end_dt
        FROM (SELECT DATE_ADD(NOW(), INTERVAL n.id - 1 DAY) AS start_dt,
                     DATE_ADD(NOW(), INTERVAL n.id + 6 DAY) AS end_dt
                FROM `numbers` n
               WHERE DATE_ADD(NOW(), INTERVAL (n.id - 1) DAY) <= '2011-01-01') x
      
    4. JOIN onto your table of data based on the datetime portion:

        SELECT x.start_dt,
               x.end_dt,
               COUNT(*) AS num
          FROM (SELECT DATE_ADD(NOW(), INTERVAL n.id - 1 DAY) AS start_dt,
                       DATE_ADD(NOW(), INTERVAL n.id + 6 DAY) AS end_dt
                  FROM `numbers` n
                 WHERE DATE_ADD(NOW(), INTERVAL (n.id - 1) DAY) <= '2011-01-01') x
          JOIN CUSTOMERS c ON c.available_date BETWEEN x.start_dt
                                                   AND x.end_dt
      GROUP BY x.start_dt, x.end_dt
      
  2. Don't use functions performed upon actual column data - IE: DATEDIFF(NOW(), *available_date*) - because a database can't use an index (if one exists) on the available_date column because the data has been altered away from the index value.

like image 28
OMG Ponies Avatar answered Jan 23 '23 20:01

OMG Ponies