Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed Up MySQL (MyISAM) COUNTs with WHERE Clauses

We are implementing a system that analyses books. The system is written in PHP, and for each book loops through the words and analyses each of them, setting certain flags (that translate to database fields) from various regular expressions and other tests.

This results in a matches table, similar to the example below:

+------------------------+--------------+------+-----+---------+----------------+
| Field                  | Type         | Null | Key | Default | Extra          |
+------------------------+--------------+------+-----+---------+----------------+
| id                     | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| regex                  | varchar(250) | YES  |     | NULL    |                |
| description            | varchar(250) | NO   |     | NULL    |                |
| phonic_description     | varchar(255) | NO   |     | NULL    |                |
| is_high_frequency      | tinyint(1)   | NO   |     | NULL    |                |
| is_readable            | tinyint(1)   | NO   |     | NULL    |                |
| book_id                | bigint(20)   | YES  |     | NULL    |                |
| matched_regex          | varchar(255) | YES  |     | NULL    |                |
| [...]                  |              |      |     |         |                |
+------------------------+--------------+------+-----+---------+----------------+

Most of the omitted fields are tinyint, either 0 or 1. There are currently 25 fields in the matches table.

There are ~2,000,000 rows in the matches table, the output of analyzing ~500 books.

Currently, there is a "reports" area of the site which queries the matches table like this:

SELECT COUNT(*)
FROM matches
WHERE is_readable = 1
AND other_flag = 0
AND another_flag = 1

However, at present it takes over a minute to fetch the main index report as each query takes about 0.7 seconds. I am caching this at a query level, but it still takes too long for the initial page load.

As I am not very experienced in how to manage datasets such as this, can anyone advise me of a better way to store or query this data? Are there any optimisations I can use with MySQL to improve the performance of these COUNTs, or am I better off using another database or data structure?

We are currently using MySQL with MyISAM tables and a VPS for this, so switching to a new database system altogether isn't out of the question.

like image 769
Alan Edwardes Avatar asked Jan 17 '23 18:01

Alan Edwardes


2 Answers

You need to use indexes, create them on the columns you do a WHERE on most frequently.

ALTER TABLE `matches` ADD INDEX ( `is_readable` ) 

etc..

You can also create indexes based on multiple columns, if your doing the same type of query over and over its useful. phpMyAdmin has the index option on the structure page of the table at the bottom.

like image 166
fire Avatar answered Jan 26 '23 00:01

fire


Add multi index to this table as you are selecting by more than one field. Below index should help a lot. Those type of indexes are very good for boolean / int columns. For indexes with varchar values read more here: http://dev.mysql.com/doc/refman/5.0/en/create-index.html

ALTER TABLE `matches` ADD INDEX ( `is_readable`, `other_flag`, `another_flag` ) 

One more thing is to check your queries by using EXPLAIN {YOUR WHOLE SQL STATEMENT} to check which index is used by DB. So in this example you should run query:

EXPLAIN ALTER TABLE `matches` ADD INDEX ( `is_readable`, `other_flag`, `another_flag` ) 

More info on EXPLAIN: http://dev.mysql.com/doc/refman/5.0/en/explain.html

like image 35
Marek Tuchalski Avatar answered Jan 26 '23 00:01

Marek Tuchalski