What is the difference between Table scan and Index scan in SQL and where it is used specifically?

Table scan means iterate over all table rows. Index scan means iterate over all index items, when item index meets search condition, table row is retrived through index. Usualy index scan is less expensive than a table scan because index is more flat than a table. They are lot of bibliografy about this issue. Sample: <ul> <li>Microsoft: Which is Faster: Index Access or Table Scan?:</li> </ul> <blockquote> Index access is an access method in which SQL Server uses an existing index to read and write data pages. Because index access significantly reduces the number of I/O read operations, it often outperforms a table scan. </blockquote> <ul> <li>Oracle: The Query Optimizer </li> </ul> <blockquote> In this method, a row is retrieved by traversing the index, using the indexed column values specified by the statement. An index scan retrieves data from an index based on the value of one or more columns in the index. To perform an index scan, Oracle searches the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index, then Oracle reads the indexed column values directly from the index, rather than from the table. </blockquote> <ul> <li>MySql: How to Avoid Table Scans </li> </ul>

As @danihp has answered the first part of the question I'll attempt to answer the second "where is it used specifically". This is for Oracle but it holds true for most RDBMS. Let's assume we have a table <code>my_table</code>, which is indexed uniquely on a column <code>id</code> and has a second index, which is non-unique, on the column <code>yet_another_column</code>: <pre class="prettyprint"><code>create my_table ( id varchar2(20) not null , another_column not null , yet_another_column , constraint pk_my_table primary key (id) ); create index i_my_table on my_table ( yet_another_column ); </code></pre> Now, if we were to <code>select * from my_table where id = '1'</code> this would / should do a unique index scan of the index <code>pk_my_table</code>. Then we re-enter the table, using the index, to return everything in <code>my_table</code> where <code>id = '1'</code>. If the query were, instead, <code>select id from my_table where id = 'a'</code> then there is no need for the second stage as all the values we need are contained within the index. In this case the query would solely do an unique index scan. Next, if our query were <code>select * from my_table where yet_another_column = 'y'</code> then we have an index on the column but it is not unique so we're going to have to look through the entire index to try to find all values that match our where condition, i.e. an index scan. Once again we're select columns that are not in our index so we have to re-enter the table to get them. Lastly, if our query were <code>select id from my_table where another_column = 'yes'</code>. We have no index on <code>another_column</code> so we have to do a table scan to find the value, i.e. we have to find everything in the table <code>where another_column = 'yes'</code>. Now, there might not seem to be much of a difference, between a table scan and an index scan in these instances. We still have to go and find a value in an object in the database. However, as the index is much smaller and specially designed to be scanned ( see other answers ) it is generally much faster to do an index scan if you only want a small proportion of the rows in the table. If you want say 10% of the table then this point becomes "it depends".

Table Scan and Index Scan in SQL

4 Answers

Table scan means iterate over all table rows.

Index scan means iterate over all index items, when item index meets search condition, table row is retrived through index.

Usualy index scan is less expensive than a table scan because index is more flat than a table.

They are lot of bibliografy about this issue. Sample:

Microsoft: Which is Faster: Index Access or Table Scan?:

Index access is an access method in which SQL Server uses an existing index to read and write data pages. Because index access significantly reduces the number of I/O read operations, it often outperforms a table scan.

Oracle: The Query Optimizer

In this method, a row is retrieved by traversing the index, using the indexed column values specified by the statement. An index scan retrieves data from an index based on the value of one or more columns in the index. To perform an index scan, Oracle searches the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index, then Oracle reads the indexed column values directly from the index, rather than from the table.

MySql: How to Avoid Table Scans

109

answered Oct 09 '22 15:10

dani herrera

Most query engines have a query optimizer, which tries to generate an effective query execution strategy. If indexes are available, which can make a query faster, then the query optimizer will perform an index scan or index seek, otherwise a table scan.

Example:

SELECT * FROM tbl WHERE category_id = 5;

If there is no index on category_id then a table scan will be performed, i.e. every single record in the table will be inspected for the right category_id.

If, however, category_id is indexed the things become more complicated. If the table is very large, an index seek will probably be chosen. However, if the table is small, then the optimizer might decide that a table scan is still faster, since some overhead is required to access an index. If the category_id is not selective enough, for instance if there are only two categories, scanning the table might be faster even for big tables.

Indexes are usually organized as tree structures. Finding an item in a tree is an O(log n) operation. A table scan is an O(n) operation. The speed is mainly determined by the number of disk accesses required to perform the query. Seeking the index first and then accessing the table for the found entries can generate more disk accesses for small tables.

Let us have a look at another query:

SELECT category_id FROM tbl WHERE category_id BETWEEN 10 AND 100;

Here there is another option available. An index seek might not be faster than a table scan in this situation, but, since we are only retrieving catergory_id's an index scan (not index seek) might be even faster. An index scan reads every entry of the index table instead of taking advantage of the tree structure (what the index seek does). However, since the requested information is fully contained in the index, no access to the data table will be required. Index scan is, like the table scan an O(n) operation, but since the index is usually smaller than the table, fewer disk accesses are required to scan the index than to scan the table.

The whole matter is very complicated and depends very much on the database engine. If you want to know more, read the documentation provided by the db vendor.

answered Oct 09 '22 15:10

Olivier Jacot-Descombes

As @danihp has answered the first part of the question I'll attempt to answer the second "where is it used specifically". This is for Oracle but it holds true for most RDBMS.

Let's assume we have a table my_table, which is indexed uniquely on a column id and has a second index, which is non-unique, on the column yet_another_column:

create my_table ( id varchar2(20) not null
                , another_column not null
                , yet_another_column
                , constraint pk_my_table primary key (id) 
                );

create index i_my_table on my_table ( yet_another_column );

Now, if we were to select * from my_table where id = '1' this would / should do a unique index scan of the index pk_my_table. Then we re-enter the table, using the index, to return everything in my_table where id = '1'.

If the query were, instead, select id from my_table where id = 'a' then there is no need for the second stage as all the values we need are contained within the index. In this case the query would solely do an unique index scan.

Next, if our query were select * from my_table where yet_another_column = 'y' then we have an index on the column but it is not unique so we're going to have to look through the entire index to try to find all values that match our where condition, i.e. an index scan. Once again we're select columns that are not in our index so we have to re-enter the table to get them.

Lastly, if our query were select id from my_table where another_column = 'yes'. We have no index on another_column so we have to do a table scan to find the value, i.e. we have to find everything in the table where another_column = 'yes'.

Now, there might not seem to be much of a difference, between a table scan and an index scan in these instances. We still have to go and find a value in an object in the database. However, as the index is much smaller and specially designed to be scanned ( see other answers ) it is generally much faster to do an index scan if you only want a small proportion of the rows in the table. If you want say 10% of the table then this point becomes "it depends".

answered Oct 09 '22 16:10

Ben

For SQL Server at least:

An index scan can be faster because, presumably, the index doesn't cover the entire set of columns in the table, while a table (or clustered index) scan has to read all of the data. If an index does include all of the columns in the table, then it should be roughly equivalent to a table scan, and the choice between an index scan and table (or CIX) scan will be a coin toss. The difference is that when you have fewer columns in the index, you can fit more index rows on an 8kb page, leading to fewer overall pages you have to read in order to scan all of the data in the index.

To illustrate what I mean, imagine if you have two copies of the phone book, one with last name, first name, street address, and phone number, and one with just last name, first name, and phone number. Now imagine that because the street address doesn't have to be printed, you can fit two extra columns of names and phone numbers on any page in the phone book. The end result of this is that the phone book is thinner, because you can fit the same number of phone numbers on fewer pages. Next, imagine you are charged with counting the number of phone numbers in the book. Which would you choose, the one with the street address listed (which has more pages, analogous to a table scan) or the one without the street address (which has fewer pages, analogous to most index scans)? I would choose the one with fewer pages.

Another wrinkle in this is that some indexes can be filtered, meaning that not only do they have fewer columns in most cases (and therefore can fit more rows onto a single page), but they can also have a WHERE clause that eliminates a lot of rows. In this case, as well, an index scan will be better than a table scan (but this will only work for queries that have a matching WHERE clause and the same semantics).

answered Oct 09 '22 14:10

Aaron Bertrand

Related questions
                            
                                Greenplum vs PostgreSQL
                            
                                SQL Server "<>" operator is very slow compared to "=" on table with a few million rows
                            
                                how to select the most frequently appearing values? [duplicate]
                            
                                SQL SELECT INSERT INTO Generate Unique Id
                            
                                LEFT function in Oracle
                            
                                select data and sum of a column for a distinct column in data
                            
                                Oracle: not a valid month
                            
                                insert multiple rows using subquery
                            
                                How to select odd or even items from a row in SQL?
                            
                                Group Rails query into arrays by attribute
                            
                                Replace empty cells with NULL values in large number of columns
                            
                                How can I subtract two row's values within same column using sql query?
                            
                                SQLite database - select the data between two dates?
                            
                                Executing SQL query on multiple databases
                            
                                How do I protect this function from SQL injection?
                            
                                SQL primary key, INT or GUID or..? [duplicate]
                            
                                My SQL Dynamic query execute and get ouput into a variable in stored procedure
                            
                                Substitution inside verbatim string literals?
                            
                                Linq: how to exclude condition if parameter is null
                            
                                How to count all characters in all rows of a field in MySQL?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Table Scan and Index Scan in SQL

Tags:

sql

database-table

Manisha Awasthi

People also ask

4 Answers

dani herrera

Olivier Jacot-Descombes

Ben

Aaron Bertrand

Recent Activity

Donate For Us