I'm trying to understand the performance of database indexes in terms of Big-O notation. Without knowing much about it, I would guess that:
Is this generally correct ? Will querying on a primary key ever give worse performance than O(1) ? My specific concern is for SQLite, but I'd be interested in knowing to what extent this varies between different databases too.
Two main types of indexing methods are 1)Primary Indexing 2) Secondary Indexing. Primary Index is an ordered file which is fixed length size with two fields. The primary Indexing is also further divided into two types 1)Dense Index 2)Sparse Index.
Default IndexesB-Tree based indexes are created by default on any primary key, foreign key, and uniquely constrained fields. While other indexing methods are shipped with the major SQL distributions, B-Tree is by far the most common and will be the focus of the remainder of this article.
B-trees are the most commonly used data structures for indexes as they are time-efficient for lookups, deletions, and insertions. All these operations can be done in logarithmic time. Data that is stored inside of a B-tree can be sorted.
Most relational databases structure indices as B-trees.
If a table has a clustering index, the data pages are stored as the leaf nodes of the B-tree. Essentially, the clustering index becomes the table.
For tables w/o a clustering index, the data pages of the table are stored in a heap. Any non-clustered indices are B-trees where the leaf node of the B-tree identifies a particular page in the heap.
The worst case height of a B-tree is O(log n), and since a search is dependent on height, B-tree lookups run in something like (on the average)
O(logt n)
where t is the minimization factor ( each node must have at least t-1 keys and at most 2*t* -1 keys (e.g., 2*t* children).
That's the way I understand it.
And different database systems, of course, may well use different data structures under the hood.
And if the query does not use an index, of course, then the search is an iteration over the heap or B-tree containing the data pages.
Searches are a little cheaper if the index used can satisfy the query; otherwise, a lookaside to fetch the corresponding datapage in memory is required.
The indexed queries (unique or not) are more typically O(log n). Very simplistically, you can think of it as being similar to a binary search in a sorted array. More accurately, it depends on the index type. But a b-tree search, for example, is still O(log n).
If there is no index, then, yes, it is O(N).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With