I have a table with 70 million records and there is an index missing. I want to calculate the time to add the index without backing up the table and doing the index on the backed up table. I am just wondering if it will be twice as slow(linear) or if it is exponential. database: mysql 5.0 Thanks a lot

(Disclaimer: I have minimal experience on MySQL) It should be somewhere in-between. The absolutely lowest complexity of the whole operation would be the one that would appear when just reading all records in order, which is a linear process - <code>O(n)</code>. This is an I/O bound operation and there is not much that can be done about it - modern caching systems in most OS may help, but only in a DB that is in use and fits in the available memory. In most SQL engines, indexes are some variation of a B-tree. The CPU complexity of inserting a single record into such a tree is roughly <code>O(log(n))</code>, where <code>n</code> is its size. For <code>n</code> records we get a complexity of <code>O(n log(n))</code>. The total complexity of the operation should be <code>O(n log(n))</code>. Of course, it's not quite that simple. Computing the index tree is not really CPU-heavy and since the index pages should fit in RAM on any modern system, the operation of inserting a single node when the tree is not rebalanced would be close to <code>O(1)</code> time-wise: a single disk operation to update a leaf page of the index. Since the tree does get rebalanced, however, things are probably a bit more complex. Multiple index pages may have to be commited to disk, thus increasing the necessary time. As a rough guess, I'd say <code>O(n log(n))</code> is a good start... It should never come anywhere close to an exponential complexity, though. EDIT: It just occured to me that 70,000,000 B-tree entries may not, in fact, fit in the in-memory cache. It would depend heavily on what is being indexed. <code>INTEGER</code> columns would probably be fine, but <code>TEXT</code> columns are another story altogether. If the average field length is 100 bytes (e.g. HTTP links or 30 characters of non-English UTF-8 text) you'd need more than 7GB of memory to store the index. Bottom line: <ul> <li>If the index fits in the cache, then since building the index should be a single DB transaction, it would be I/O-bound and roughly linear as all the records have to be parsed and then the index itelse has to be written-out to permanent storage.</li> <li>If the index does not fit in the cache, then the complexity rises, as I/O wait-times on the index itself become involved in each operation.</li> </ul>

Will adding an index on a table of 2 million records be twice as slow as the same table with 1 million records?

1 Answers

(Disclaimer: I have minimal experience on MySQL)

It should be somewhere in-between.

The absolutely lowest complexity of the whole operation would be the one that would appear when just reading all records in order, which is a linear process - O(n). This is an I/O bound operation and there is not much that can be done about it - modern caching systems in most OS may help, but only in a DB that is in use and fits in the available memory.

In most SQL engines, indexes are some variation of a B-tree. The CPU complexity of inserting a single record into such a tree is roughly O(log(n)), where n is its size. For n records we get a complexity of O(n log(n)). The total complexity of the operation should be O(n log(n)).

Of course, it's not quite that simple. Computing the index tree is not really CPU-heavy and since the index pages should fit in RAM on any modern system, the operation of inserting a single node when the tree is not rebalanced would be close to O(1) time-wise: a single disk operation to update a leaf page of the index.

Since the tree does get rebalanced, however, things are probably a bit more complex. Multiple index pages may have to be commited to disk, thus increasing the necessary time. As a rough guess, I'd say O(n log(n)) is a good start...

It should never come anywhere close to an exponential complexity, though.

EDIT:

It just occured to me that 70,000,000 B-tree entries may not, in fact, fit in the in-memory cache. It would depend heavily on what is being indexed. INTEGER columns would probably be fine, but TEXT columns are another story altogether. If the average field length is 100 bytes (e.g. HTTP links or 30 characters of non-English UTF-8 text) you'd need more than 7GB of memory to store the index.

Bottom line:

If the index fits in the cache, then since building the index should be a single DB transaction, it would be I/O-bound and roughly linear as all the records have to be parsed and then the index itelse has to be written-out to permanent storage.
If the index does not fit in the cache, then the complexity rises, as I/O wait-times on the index itself become involved in each operation.

answered Sep 23 '22 16:09

thkala

Related questions
                            
                                MSSQL Row_Number() over(order by) in MySql
                            
                                MySql Cursor - Creating a procedure
                            
                                MySQL syntax in Rails 3, NOT Case sensitive searching, not working on Heroku?
                            
                                Add a numbered list column to a returned MySQL query
                            
                                2 user in MySQL, hosts = "%" and '" (empty)
                            
                                Instead of independently joining multiple tables, use separate queries?
                            
                                How to measure distance using Haversine formula with MySQL?
                            
                                Does a SQL UNION with a LIMIT optimize away uneeded queries?
                            
                                Select * sql query vs Select specific columns sql query [duplicate]
                            
                                How do I add index to a column in existing table?
                            
                                Check Mysql DB for username and password C#
                            
                                Proper MySQL way to add a column from one table to another
                            
                                Can't insert a unicode glyph in mysql
                            
                                Breadth first search query in MySQL?
                            
                                Hibernate Issuing too many queries on MySQL
                            
                                SQL injection Attack
                            
                                How to do a "horizontal UNION" in MySQL? (concating tables)
                            
                                when initializing PDO - should I do: charset=UTF8 or charset=UTF8MB4?
                            
                                How do databases physically store data on a filesystem?
                            
                                How to select the most recent set of dated records from a mysql table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Will adding an index on a table of 2 million records be twice as slow as the same table with 1 million records?

Tags:

indexing

mysql

Michael Koper

People also ask

1 Answers

thkala

Recent Activity

Donate For Us