Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database indexes and their Big-O notation

I'm trying to understand the performance of database indexes in terms of Big-O notation. Without knowing much about it, I would guess that:

  • Querying on a primary key or unique index will give you a O(1) lookup time.
  • Querying on a non-unique index will also give a O(1) time, albeit maybe the '1' is slower than for the unique index (?)
  • Querying on a column without an index will give a O(N) lookup time (full table scan).

Is this generally correct ? Will querying on a primary key ever give worse performance than O(1) ? My specific concern is for SQLite, but I'd be interested in knowing to what extent this varies between different databases too.

like image 311
Michael Low Avatar asked Jan 14 '11 18:01

Michael Low


People also ask

How many types of indexes are there in database?

Two main types of indexing methods are 1)Primary Indexing 2) Secondary Indexing. Primary Index is an ordered file which is fixed length size with two fields. The primary Indexing is also further divided into two types 1)Dense Index 2)Sparse Index.

What is the default indexing method used in databases?

Default IndexesB-Tree based indexes are created by default on any primary key, foreign key, and uniquely constrained fields. While other indexing methods are shipped with the major SQL distributions, B-Tree is by far the most common and will be the focus of the remainder of this article.

What is the most common data structure used for DB indexes?

B-trees are the most commonly used data structures for indexes as they are time-efficient for lookups, deletions, and insertions. All these operations can be done in logarithmic time. Data that is stored inside of a B-tree can be sorted.


2 Answers

Most relational databases structure indices as B-trees.

If a table has a clustering index, the data pages are stored as the leaf nodes of the B-tree. Essentially, the clustering index becomes the table.

For tables w/o a clustering index, the data pages of the table are stored in a heap. Any non-clustered indices are B-trees where the leaf node of the B-tree identifies a particular page in the heap.

The worst case height of a B-tree is O(log n), and since a search is dependent on height, B-tree lookups run in something like (on the average)

O(logt n)

where t is the minimization factor ( each node must have at least t-1 keys and at most 2*t* -1 keys (e.g., 2*t* children).

That's the way I understand it.

And different database systems, of course, may well use different data structures under the hood.

And if the query does not use an index, of course, then the search is an iteration over the heap or B-tree containing the data pages.

Searches are a little cheaper if the index used can satisfy the query; otherwise, a lookaside to fetch the corresponding datapage in memory is required.

like image 87
Nicholas Carey Avatar answered Oct 03 '22 22:10

Nicholas Carey


The indexed queries (unique or not) are more typically O(log n). Very simplistically, you can think of it as being similar to a binary search in a sorted array. More accurately, it depends on the index type. But a b-tree search, for example, is still O(log n).

If there is no index, then, yes, it is O(N).

like image 34
Mark Wilkins Avatar answered Oct 03 '22 20:10

Mark Wilkins