Efficiently querying a 15,000,000 rows table in MySQL

Question

Consider the following database tables:

Table "messages" with 13,000,000 rows (one row per message).
Table "users" with 3,000,000 rows (one row per user).

The following query is used to fetch a bunch of messages and the corresponding users:

SELECT messages.id, messages.message, users.id, users.username
FROM messages
INNER JOIN users ON messages.user_id=users.id 
WHERE messages.id in (?, ?, ?, ? ... a total of 100 "?":s);

100 messages are fetched in each query.

"messages" is indexed on id (primary key, BIGINT not auto-generated) and user_id.

"users" is indexed on id (primary key, INT auto-generated).

The database is MySQL using MyISAM.

Currently the query takes well over 3000 ms to execute which puzzles me since "messages" is indexed on "id", so retrieving the correct rows should be very quick.

My question is: Given the describe scenario and setup, is a 3000 ms query time "normal" or am I missing something? Please let me know if further details are required.

Update #1: Here are the table definitions:

CREATE TABLE messages (
  id bigint(20) NOT NULL DEFAULT '0',
  user_id int(11) NOT NULL DEFAULT '0',
  message varchar(160) NOT NULL DEFAULT '',
  PRIMARY KEY (id),
  KEY user_id (user_id),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE users (
  id int(11) NOT NULL DEFAULT '0',
  username varchar(32) NOT NULL DEFAULT '',
  PRIMARY KEY (id),
  UNIQUE KEY username (username),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

The only "non-standard" thing I observe in the definitions is that "messages.id" is a BIGINT rather than an INT. Could that be a hint?

Mike Venzke · Accepted Answer

I've worked on MyISAM tables with billions of rows, and one of the things I found after some row count limit was that the optimizer took far too long to decide how to approach a query, and wrongly performed some table scans. I can't find the exact page I found describing it, but I started always using FORCE_INDEX on each segment of queries where I knew how it should request objects

http://dev.mysql.com/doc/refman/5.1/en/index-hints.html

The fact of the matter is, if you're using tables that large, you need to be designing each and every query to work with your indices, so there's nothing wrong with forcing an index. It will still scan the table if it has to, but FORCE_INDEX tells it not to unless it absolutely has to.

Also, if your tables are large, I assume your indices are also large. You absolutely need to make sure you have proper configuration settings and that your key_buffer is an adequate size and you have enough i/o. If you're running 32bit mysql (which you shouldn't be), then put your key_buffer as large as 1GB (asuming you have 1GB to spare) and check its usage with 'mysqlreport'

If you're running 64bit mysql, opt for making it as large as you can, while still leaving space for the OS to cache files and whatever other applications you have running, so maybe a couple of GB if you can.

Even if your queries use indices, if the index can't be properly buffered in memory, you're still hitting the disk and there's a performance hit proportional to the index size and speed of disk / available i/o.

As far as int vs big int, the only noticeable performance difference I've seen is in performing calculations on big ints, like SUM. SUM is appreciably slower on big int than on int, so much so that I'd look at storing numbers in a different magnitude, or separating them out to two ints if you need to perform frequent calculations on them.

Efficiently querying a 15,000,000 rows table in MySQL

Tags:

optimization

indexing

mysql

query-optimization

knorv

1 Answers

Mike Venzke

Recent Activity

Donate For Us

Efficiently querying a 15,000,000 rows table in MySQL

Tags:

optimization

indexing

mysql

query-optimization

knorv

1 Answers

Mike Venzke

Related questions

Recent Activity

Donate For Us