We are implementing a system that analyses books. The system is written in PHP, and for each book loops through the words and analyses each of them, setting certain flags (that translate to database fields) from various regular expressions and other tests.
This results in a matches
table, similar to the example below:
+------------------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------------------+--------------+------+-----+---------+----------------+ | id | bigint(20) | NO | PRI | NULL | auto_increment | | regex | varchar(250) | YES | | NULL | | | description | varchar(250) | NO | | NULL | | | phonic_description | varchar(255) | NO | | NULL | | | is_high_frequency | tinyint(1) | NO | | NULL | | | is_readable | tinyint(1) | NO | | NULL | | | book_id | bigint(20) | YES | | NULL | | | matched_regex | varchar(255) | YES | | NULL | | | [...] | | | | | | +------------------------+--------------+------+-----+---------+----------------+
Most of the omitted fields are tinyint
, either 0 or 1. There are currently 25 fields in the matches table.
There are ~2,000,000 rows in the matches table, the output of analyzing ~500 books.
Currently, there is a "reports" area of the site which queries the matches
table like this:
SELECT COUNT(*)
FROM matches
WHERE is_readable = 1
AND other_flag = 0
AND another_flag = 1
However, at present it takes over a minute to fetch the main index report as each query takes about 0.7 seconds. I am caching this at a query level, but it still takes too long for the initial page load.
As I am not very experienced in how to manage datasets such as this, can anyone advise me of a better way to store or query this data? Are there any optimisations I can use with MySQL to improve the performance of these COUNT
s, or am I better off using another database or data structure?
We are currently using MySQL with MyISAM tables and a VPS for this, so switching to a new database system altogether isn't out of the question.
You need to use indexes, create them on the columns you do a WHERE
on most frequently.
ALTER TABLE `matches` ADD INDEX ( `is_readable` )
etc..
You can also create indexes based on multiple columns, if your doing the same type of query over and over its useful. phpMyAdmin has the index option on the structure page of the table at the bottom.
Add multi index to this table as you are selecting by more than one field. Below index should help a lot. Those type of indexes are very good for boolean
/ int
columns. For indexes with varchar
values read more here: http://dev.mysql.com/doc/refman/5.0/en/create-index.html
ALTER TABLE `matches` ADD INDEX ( `is_readable`, `other_flag`, `another_flag` )
One more thing is to check your queries by using EXPLAIN {YOUR WHOLE SQL STATEMENT} to check which index is used by DB. So in this example you should run query:
EXPLAIN ALTER TABLE `matches` ADD INDEX ( `is_readable`, `other_flag`, `another_flag` )
More info on EXPLAIN: http://dev.mysql.com/doc/refman/5.0/en/explain.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With