Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

COUNT and GROUP BY on text fields seems slow

I'm building a MySQL database which contains entries about special substrings of DNA in species of yeast. My table looks like this:

+--------------+---------+------+-----+---------+-------+
| Field        | Type    | Null | Key | Default | Extra |
+--------------+---------+------+-----+---------+-------+
| species      | text    | YES  | MUL | NULL    |       |
| region       | text    | YES  | MUL | NULL    |       |
| gene         | text    | YES  | MUL | NULL    |       |
| startPos     | int(11) | YES  |     | NULL    |       |
| repeatLength | int(11) | YES  |     | NULL    |       |
| coreLength   | int(11) | YES  |     | NULL    |       |
| sequence     | text    | YES  | MUL | NULL    |       |
+--------------+---------+------+-----+---------+-------+

There are approximately 1.8 million records. In one type of query I want to see how many DNA substrings are associated with each type of species and region, so I issue this query:

select species, region, count(*) group by species, region;

The species and region columns have only two possible entries (conserved/scer for species, and promoter/coding for region) yet this query takes about 30 seconds.

Is this a normal amount of time to expect for this type of query given the size of the table? Is it slow because I'm using text fields instead of simple integer or boolean values (I prefer text fields as several non-CS researchers will be using the DB). Any other ideas and suggestions would be welcome.

Please excuse if this is a boneheaded question, I am an SQL neophyte.

P.S. I've also seen this question but the proposed solution doesn't seem relevant for what I'm doing.

EDIT: Converting those fields to VARCHARs reduced the runtime to ~2.5 seconds. Note I also timed it against ENUMs which had a similar timing.

like image 788
Rich Avatar asked Jul 22 '10 02:07

Rich


People also ask

Does GROUP BY Make query slower?

Improving performance with SQL aggregate functions The main problem with GROUP BY is that queries involving it are usually slow, especially when compared with WHERE -only queries.

Does Count work without GROUP BY?

Using COUNT, without GROUP BY clause will return a total count of a number of rows present in the table. Adding GROUP BY, we can COUNT total occurrences for each unique value present in the column. we can use the following command to create a database called geeks.


1 Answers

Why're all your string based columns defined as TEXT? If you read the performance comparison, you'll see that TEXT was ~3x slower than a VARCHAR column using identical indexing: http://forums.mysql.com/read.php?24,105964,105964

like image 156
OMG Ponies Avatar answered Sep 25 '22 23:09

OMG Ponies