I am trying to optimize a bigger query and ran into this wall when I realized this part of the query was doing a full table scan, which in my mind does not make sense considering the field in question is a primary key. I would assume that the MySQL Optimizer would use the index.
Here is the table:
CREATE TABLE userapplication (
application_id int(11) NOT NULL auto_increment,
userid int(11) NOT NULL default '0',
accountid int(11) NOT NULL default '0',
resume_id int(11) NOT NULL default '0',
coverletter_id int(11) NOT NULL default '0',
user_email varchar(100) NOT NULL default '',
account_name varchar(200) NOT NULL default '',
resume_name varchar(255) NOT NULL default '',
resume_modified datetime NOT NULL default '0000-00-00 00:00:00',
cover_name varchar(255) NOT NULL default '',
cover_modified datetime NOT NULL default '0000-00-00 00:00:00',
application_status tinyint(4) NOT NULL default '0',
application_created datetime NOT NULL default '0000-00-00 00:00:00',
application_modified timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
publishid int(11) NOT NULL default '0',
application_visible int(11) default '1',
PRIMARY KEY (application_id),
KEY publishid (publishid),
KEY application_status (application_status),
KEY userid (userid),
KEY accountid (accountid),
KEY application_created (application_created),
KEY resume_id (resume_id),
KEY coverletter_id (coverletter_id),
) ENGINE=MyISAM ;
This simple query seems to do a full table scan:
SELECT * FROM userapplication WHERE application_id > 1025;
This is the output of the EXPLAIN:
+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+ | 1 | SIMPLE | userapplication | ALL | PRIMARY | NULL | NULL | NULL | 784422 | Using where | +----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+`
Any ideas how to prevent this simple query from doing a full table scan? Or am I out of luck?
The Benefits and Drawbacks of Using Indexes in MySQLIndexes consume disk space. Indexes degrade the performance of INSERT, UPDATE and DELETE queries – when data is updated, the index needs to be updated together with it. MySQL does not protect you from using multiple types of indexes at the same time.
Solution #1: OPTIMIZE If MySQL got it wrong, it may be because the table was frequently changed. This affects the statistics. If we can spare the time (table is locked during that time), we could help out by rebuilding the table.
The Drawbacks of Using IndexesIndexes consume disk space – an index occupies its own space, so indexed data will consume more disk space too; Redundant and duplicate indexes can be a problem – MySQL allows you to create duplicate indexes on a column and it does not “protect you” from doing such a mistake.
Indexes should not be used on small tables. Indexes should not be used on columns that return a high percentage of data rows when used as a filter condition in a query's WHERE clause. For instance, you would not have an entry for the word "the" or "and" in the index of a book.
You'd probably be better off letting MySql decide on the query plan. There is a good chance that doing an index scan would be less efficient than a full table scan.
There are two data structures on disk for this table
When you run a query the optimizer has two options about how to access the data:
SELECT * FROM userapplication WHERE application_id > 1025;
Using The Index
application_id > 1025
Not using the Index
Scan the entire table, and pick the appropriate records.
Choosing the best stratergy
The job of the query optimizer is to choose the most efficient strategy for getting the data you want. If there are a lot of rows with an application_id > 1025
then it can actually be less efficient to use the index. For example if 90% of the records have an application_id > 1025
then the query optimizer would have to scan around 90% of the leaf nodes of the b-tree index and then read at least 90% of the table as well to get the actual data; this would involve reading more data from disk than just scanning the table.
MyISAM
tables are not clustered, a PRIMARY KEY
index is a secondary index and requires an additional table lookup to get the other values.
It is several times more expensive to traverse the index and do the lookups. If you condition is not very selective (yields a large share of total records), MySQL
will consider table scan cheaper.
To prevent it from doing a table scan, you could add a hint:
SELECT *
FROM userapplication FORCE INDEX (PRIMARY)
WHERE application_id > 1025
, though it would not necessarily be more efficient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With