I'm trying to understand the performance of database indexes in terms of Big-O notation. Without knowing much about it, I would guess that: <ul> <li>Querying on a primary key or unique index will give you a O(1) lookup time.</li> <li>Querying on a non-unique index will also give a O(1) time, albeit maybe the '1' is slower than for the unique index (?)</li> <li>Querying on a column without an index will give a O(N) lookup time (full table scan).</li> </ul> Is this generally correct ? Will querying on a primary key ever give worse performance than O(1) ? My specific concern is for SQLite, but I'd be interested in knowing to what extent this varies between different databases too.

Most relational databases structure indices as B-trees. If a table has a clustering index, the data pages are stored as the leaf nodes of the B-tree. Essentially, the clustering index becomes the table. For tables w/o a clustering index, the data pages of the table are stored in a heap. Any non-clustered indices are B-trees where the leaf node of the B-tree identifies a particular page in the heap. The worst case height of a B-tree is O(log n), and since a search is dependent on height, B-tree lookups run in something like (on the average) O(logt n) where t is the minimization factor ( each node must have at least t-1 keys and at most 2*t* -1 keys (e.g., 2*t* children). That's the way I understand it. And different database systems, of course, may well use different data structures under the hood. And if the query does not use an index, of course, then the search is an iteration over the heap or B-tree containing the data pages. Searches are a little cheaper if the index used can satisfy the query; otherwise, a lookaside to fetch the corresponding datapage in memory is required.

Database indexes and their Big-O notation

Tags:

big-o

sql

database

indexing

sqlite

I'm trying to understand the performance of database indexes in terms of Big-O notation. Without knowing much about it, I would guess that:

Querying on a primary key or unique index will give you a O(1) lookup time.
Querying on a non-unique index will also give a O(1) time, albeit maybe the '1' is slower than for the unique index (?)
Querying on a column without an index will give a O(N) lookup time (full table scan).

Is this generally correct ? Will querying on a primary key ever give worse performance than O(1) ? My specific concern is for SQLite, but I'd be interested in knowing to what extent this varies between different databases too.

311

asked Jan 14 '11 18:01

Michael Low

2 Answers

Most relational databases structure indices as B-trees.

If a table has a clustering index, the data pages are stored as the leaf nodes of the B-tree. Essentially, the clustering index becomes the table.

For tables w/o a clustering index, the data pages of the table are stored in a heap. Any non-clustered indices are B-trees where the leaf node of the B-tree identifies a particular page in the heap.

The worst case height of a B-tree is O(log n), and since a search is dependent on height, B-tree lookups run in something like (on the average)

O(log_t n)

where t is the minimization factor ( each node must have at least t-1 keys and at most 2*t* -1 keys (e.g., 2*t* children).

That's the way I understand it.

And different database systems, of course, may well use different data structures under the hood.

And if the query does not use an index, of course, then the search is an iteration over the heap or B-tree containing the data pages.

Searches are a little cheaper if the index used can satisfy the query; otherwise, a lookaside to fetch the corresponding datapage in memory is required.

answered Oct 03 '22 22:10

Nicholas Carey

The indexed queries (unique or not) are more typically O(log n). Very simplistically, you can think of it as being similar to a binary search in a sorted array. More accurately, it depends on the index type. But a b-tree search, for example, is still O(log n).

If there is no index, then, yes, it is O(N).

answered Oct 03 '22 20:10

Mark Wilkins

Related questions
                            
                                Produce DISTINCT values in STRING_AGG
                            
                                Database Design with Change History
                            
                                How to pass a temp table as a parameter into a separate stored procedure
                            
                                How can I reuse a Common Table Expression
                            
                                Hierarchical/tree database for directories path in filesystem
                            
                                Regular expression in PostgreSQL LIKE clause
                            
                                SQL Server SELECT where any column contains 'x'
                            
                                Inserting NULL into MySQL timestamp
                            
                                Select Query by Pair of fields using an in clause
                            
                                Humanized or natural number sorting of mixed word-and-number strings
                            
                                Execute SQL from file in SQLAlchemy
                            
                                How to create a pivot query in sql server without aggregate function
                            
                                The "right" way to do stored procedure parameter validation
                            
                                In which sequence are queries and sub-queries executed by the SQL engine?
                            
                                Curly braces in T-SQL
                            
                                Database - (rows or records, columns or fields)?
                            
                                SQL: How do you select only groups that do not contain a certain value?
                            
                                Do database transactions prevent race conditions?
                            
                                Rails: Show SQL Queries in Production Log
                            
                                What is a named query?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With