Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Table Scan and Index Scan in SQL

What is the difference between Table scan and Index scan in SQL and where it is used specifically?

like image 670
Manisha Awasthi Avatar asked Jan 02 '12 16:01

Manisha Awasthi


People also ask

What is table scan and index scan?

Table scan means iterate over all table rows. Index scan means iterate over all index items, when item index meets search condition, table row is retrived through index. Usualy index scan is less expensive than a table scan because index is more flat than a table.

Is index scan better than table scan?

3) index scan is faster than a table scan because they look at sorted data and query optimizers know when to stop and look for another range. 4) index seek is the fastest way to retrieve data and it comes into the picture when your search criterion is very specific.

What is a index scan?

An index scan occurs when the database manager accesses an index to narrow the set of qualifying rows (by scanning the rows in a specified range of the index) before accessing the base table; to order the output; or to retrieve the requested column data directly ( index-only access ).

Is Clustered index scan same as table scan?

A table scan has to examine every single row of the table. The clustered index scan only needs to scan the index. It doesn't scan every record in the table. That's the point, really, of indices.


4 Answers

Table scan means iterate over all table rows.

Index scan means iterate over all index items, when item index meets search condition, table row is retrived through index.

Usualy index scan is less expensive than a table scan because index is more flat than a table.

They are lot of bibliografy about this issue. Sample:

  • Microsoft: Which is Faster: Index Access or Table Scan?:

Index access is an access method in which SQL Server uses an existing index to read and write data pages. Because index access significantly reduces the number of I/O read operations, it often outperforms a table scan.

  • Oracle: The Query Optimizer

In this method, a row is retrieved by traversing the index, using the indexed column values specified by the statement. An index scan retrieves data from an index based on the value of one or more columns in the index. To perform an index scan, Oracle searches the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index, then Oracle reads the indexed column values directly from the index, rather than from the table.

  • MySql: How to Avoid Table Scans
like image 109
dani herrera Avatar answered Oct 09 '22 15:10

dani herrera


Most query engines have a query optimizer, which tries to generate an effective query execution strategy. If indexes are available, which can make a query faster, then the query optimizer will perform an index scan or index seek, otherwise a table scan.

Example:

SELECT * FROM tbl WHERE category_id = 5;

If there is no index on category_id then a table scan will be performed, i.e. every single record in the table will be inspected for the right category_id.

If, however, category_id is indexed the things become more complicated. If the table is very large, an index seek will probably be chosen. However, if the table is small, then the optimizer might decide that a table scan is still faster, since some overhead is required to access an index. If the category_id is not selective enough, for instance if there are only two categories, scanning the table might be faster even for big tables.

Indexes are usually organized as tree structures. Finding an item in a tree is an O(log n) operation. A table scan is an O(n) operation. The speed is mainly determined by the number of disk accesses required to perform the query. Seeking the index first and then accessing the table for the found entries can generate more disk accesses for small tables.

Let us have a look at another query:

SELECT category_id FROM tbl WHERE category_id BETWEEN 10 AND 100;

Here there is another option available. An index seek might not be faster than a table scan in this situation, but, since we are only retrieving catergory_id's an index scan (not index seek) might be even faster. An index scan reads every entry of the index table instead of taking advantage of the tree structure (what the index seek does). However, since the requested information is fully contained in the index, no access to the data table will be required. Index scan is, like the table scan an O(n) operation, but since the index is usually smaller than the table, fewer disk accesses are required to scan the index than to scan the table.

The whole matter is very complicated and depends very much on the database engine. If you want to know more, read the documentation provided by the db vendor.

like image 21
Olivier Jacot-Descombes Avatar answered Oct 09 '22 15:10

Olivier Jacot-Descombes


As @danihp has answered the first part of the question I'll attempt to answer the second "where is it used specifically". This is for Oracle but it holds true for most RDBMS.

Let's assume we have a table my_table, which is indexed uniquely on a column id and has a second index, which is non-unique, on the column yet_another_column:

create my_table ( id varchar2(20) not null
                , another_column not null
                , yet_another_column
                , constraint pk_my_table primary key (id) 
                );

create index i_my_table on my_table ( yet_another_column );

Now, if we were to select * from my_table where id = '1' this would / should do a unique index scan of the index pk_my_table. Then we re-enter the table, using the index, to return everything in my_table where id = '1'.

If the query were, instead, select id from my_table where id = 'a' then there is no need for the second stage as all the values we need are contained within the index. In this case the query would solely do an unique index scan.

Next, if our query were select * from my_table where yet_another_column = 'y' then we have an index on the column but it is not unique so we're going to have to look through the entire index to try to find all values that match our where condition, i.e. an index scan. Once again we're select columns that are not in our index so we have to re-enter the table to get them.

Lastly, if our query were select id from my_table where another_column = 'yes'. We have no index on another_column so we have to do a table scan to find the value, i.e. we have to find everything in the table where another_column = 'yes'.

Now, there might not seem to be much of a difference, between a table scan and an index scan in these instances. We still have to go and find a value in an object in the database. However, as the index is much smaller and specially designed to be scanned ( see other answers ) it is generally much faster to do an index scan if you only want a small proportion of the rows in the table. If you want say 10% of the table then this point becomes "it depends".

like image 26
Ben Avatar answered Oct 09 '22 16:10

Ben


For SQL Server at least:

An index scan can be faster because, presumably, the index doesn't cover the entire set of columns in the table, while a table (or clustered index) scan has to read all of the data. If an index does include all of the columns in the table, then it should be roughly equivalent to a table scan, and the choice between an index scan and table (or CIX) scan will be a coin toss. The difference is that when you have fewer columns in the index, you can fit more index rows on an 8kb page, leading to fewer overall pages you have to read in order to scan all of the data in the index.

To illustrate what I mean, imagine if you have two copies of the phone book, one with last name, first name, street address, and phone number, and one with just last name, first name, and phone number. Now imagine that because the street address doesn't have to be printed, you can fit two extra columns of names and phone numbers on any page in the phone book. The end result of this is that the phone book is thinner, because you can fit the same number of phone numbers on fewer pages. Next, imagine you are charged with counting the number of phone numbers in the book. Which would you choose, the one with the street address listed (which has more pages, analogous to a table scan) or the one without the street address (which has fewer pages, analogous to most index scans)? I would choose the one with fewer pages.

Another wrinkle in this is that some indexes can be filtered, meaning that not only do they have fewer columns in most cases (and therefore can fit more rows onto a single page), but they can also have a WHERE clause that eliminates a lot of rows. In this case, as well, an index scan will be better than a table scan (but this will only work for queries that have a matching WHERE clause and the same semantics).

like image 44
Aaron Bertrand Avatar answered Oct 09 '22 14:10

Aaron Bertrand