Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does MySQL not use an index for a greater than comparison?

I am trying to optimize a bigger query and ran into this wall when I realized this part of the query was doing a full table scan, which in my mind does not make sense considering the field in question is a primary key. I would assume that the MySQL Optimizer would use the index.

Here is the table:


CREATE TABLE userapplication (
  application_id int(11) NOT NULL auto_increment,
  userid int(11) NOT NULL default '0',
  accountid int(11) NOT NULL default '0',
  resume_id int(11) NOT NULL default '0',
  coverletter_id int(11) NOT NULL default '0',
  user_email varchar(100) NOT NULL default '',
  account_name varchar(200) NOT NULL default '',
  resume_name varchar(255) NOT NULL default '',
  resume_modified datetime NOT NULL default '0000-00-00 00:00:00',
  cover_name varchar(255) NOT NULL default '',
  cover_modified datetime NOT NULL default '0000-00-00 00:00:00',
  application_status tinyint(4) NOT NULL default '0',
  application_created datetime NOT NULL default '0000-00-00 00:00:00',
  application_modified timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  publishid int(11) NOT NULL default '0',
  application_visible int(11) default '1',
  PRIMARY KEY  (application_id),
  KEY publishid (publishid),
  KEY application_status (application_status),
  KEY userid (userid),
  KEY accountid (accountid),
  KEY application_created (application_created),
  KEY resume_id (resume_id),
  KEY coverletter_id (coverletter_id),
 ) ENGINE=MyISAM ;

This simple query seems to do a full table scan:

SELECT * FROM userapplication WHERE application_id > 1025;

This is the output of the EXPLAIN:

+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table             | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | userapplication | ALL  | PRIMARY       | NULL | NULL    | NULL | 784422 | Using where |
+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+`

Any ideas how to prevent this simple query from doing a full table scan? Or am I out of luck?

like image 981
Robin Avatar asked Jan 14 '11 13:01

Robin


People also ask

Why index is not used in MySQL?

The Benefits and Drawbacks of Using Indexes in MySQLIndexes consume disk space. Indexes degrade the performance of INSERT, UPDATE and DELETE queries – when data is updated, the index needs to be updated together with it. MySQL does not protect you from using multiple types of indexes at the same time.

Why MySQL does not pick correct index for few queries?

Solution #1: OPTIMIZE If MySQL got it wrong, it may be because the table was frequently changed. This affects the statistics. If we can spare the time (table is locked during that time), we could help out by rebuilding the table.

What are disadvantages of using indexes in MySQL?

The Drawbacks of Using IndexesIndexes consume disk space – an index occupies its own space, so indexed data will consume more disk space too; Redundant and duplicate indexes can be a problem – MySQL allows you to create duplicate indexes on a column and it does not “protect you” from doing such a mistake.

When should indexes not be used?

Indexes should not be used on small tables. Indexes should not be used on columns that return a high percentage of data rows when used as a filter condition in a query's WHERE clause. For instance, you would not have an entry for the word "the" or "and" in the index of a book.


2 Answers

You'd probably be better off letting MySql decide on the query plan. There is a good chance that doing an index scan would be less efficient than a full table scan.

There are two data structures on disk for this table

  1. The table itself; and
  2. The primary key B-Tree index.

When you run a query the optimizer has two options about how to access the data:

SELECT * FROM userapplication WHERE application_id > 1025;

Using The Index

  1. Scan the B-Tree index to find the address of all the rows where application_id > 1025
  2. Read the appropriate pages of the table to get the data for these rows.

Not using the Index

Scan the entire table, and pick the appropriate records.

Choosing the best stratergy

The job of the query optimizer is to choose the most efficient strategy for getting the data you want. If there are a lot of rows with an application_id > 1025 then it can actually be less efficient to use the index. For example if 90% of the records have an application_id > 1025 then the query optimizer would have to scan around 90% of the leaf nodes of the b-tree index and then read at least 90% of the table as well to get the actual data; this would involve reading more data from disk than just scanning the table.

like image 168
Andrew Skirrow Avatar answered Sep 28 '22 03:09

Andrew Skirrow


MyISAM tables are not clustered, a PRIMARY KEY index is a secondary index and requires an additional table lookup to get the other values.

It is several times more expensive to traverse the index and do the lookups. If you condition is not very selective (yields a large share of total records), MySQL will consider table scan cheaper.

To prevent it from doing a table scan, you could add a hint:

SELECT  *
FROM    userapplication FORCE INDEX (PRIMARY)
WHERE   application_id > 1025

, though it would not necessarily be more efficient.

like image 33
Quassnoi Avatar answered Sep 28 '22 04:09

Quassnoi