Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When does a database table get large enough that an index is beneficial?

Hypothetically, in a SQL Server database, if I have a table with two int fields (say a many-to-many relation) that participates in joins between two other tables, at what approximate size does the table become large enough where the performance benefit of indexes on the two int fields overcomes the overhead imposed by said indexes?

Are there differences in architecture between different versions of SQL Server that would substantially change this answer?

like image 922
Robert Harvey Avatar asked Jun 22 '09 17:06

Robert Harvey


3 Answers

For the queries involving small portions of the table rows, indexes are always beneficial, be there 100 rows or 1,000,000.

See this entry in my blog for examples with plans and performance details:

  • Indexing tiny tables

The queries like this:

SELECT  *
FROM    table1 t1
JOIN    table2 t2
ON      t2.col = t1.col

will most probably use HASH JOIN. A hash table for the smaller table will be built, and the rows from the larger table will be used to probe the hash table.

To do this, no index is needed.

However, this query:

SELECT  *
FROM    table1 t1
JOIN    table2 t2
ON      t2.col = t1.col
WHERE   t1.othercol = @value

will use NESTED LOOPS: the rows from the outer table (table1) will be searched using an index on table1.othercol, and the rows from the inner table (table2) will be searched using an index on table2.col.

If you don't have an index on col1, a HASH JOIN will be used which requires scanning all rows from both tables and some more resources to built a hash table.

Indexes are also useful for the queries like this:

SELECT  t2.col
FROM    table1 t1
JOIN    table2 t2
ON      t2.col = t1.col

, in which case the engine doesn't need to read table2 itself at all: eveything you need for this query can be found in the index, which can be much smaller than the table itself and more efficient to read.

And, of course, if you need your data sorted and have indexes on both table1.col and table2.col, then the following query:

SELECT  *
FROM    table1 t1
JOIN    table2 t2
ON      t2.col = t1.col
ORDER BY
        t2.col

will probably use MERGE JOIN method, which is super fast if both input rowset are sorted, and its output is also sorted, which means that ORDER BY comes free.

Note that even if you don't have an index, an optimizer may choose to Eager Spool your small table, which means building a temporary index for the duration of the query and dropped the index after the query finishes.

If the query is small, it will be very fast, but again, an index won't hurt (for SELECT queries I mean). If the optimizer won't need it, it just will not be used.

Note, though, that creating an index may affect DML performance, but it's other story.

like image 103
Quassnoi Avatar answered Sep 18 '22 12:09

Quassnoi


The penalty for insertion will be negligible until long after the benefit of the indexes will appear. The optimizer is smart enough to ignore the indexes anyway until that point clicks in. So just index the table from the start.

like image 28
dkretz Avatar answered Sep 18 '22 12:09

dkretz


Regardless of size, there is always a performance benefit to using an index when doing a lookup.

Regarding overhead, the question becomes: what overhead do you mean, and how do you relate it to the value of a lookup? The two are separate values, after all.

There are two forms of overhead for an index: space (which is usually negligible, depending on how the index is structured), and re-index on insert (the server must recalculate an index after every insert).

As I mentioned, the space issue probably isn't that big a deal. But re-indexing is. Fortunately, you need to be doing lots of near-continuous inserting before that form of overhead becomes a problem.

So bottom line: You're almost always better off having an index. Start from that position and wait until re-indexing becomes a bottleneck. Then you can look into alternatives.

like image 25
Randolpho Avatar answered Sep 21 '22 12:09

Randolpho