Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL - how long to create an index?

Tags:

indexing

mysql

Can anyone tell me how adding a key scales in MySQL? I have 500,000,000 rows in a database, trans, with columns i (INT UNSIGNED), j (INT UNSIGNED), nu (DOUBLE), A (DOUBLE). I try to index a column, e.g.

ALTER TABLE trans ADD KEY idx_A (A); 

and I wait. For a table of 14,000,000 rows it took about 2 minutes to execute on my MacBook Pro, but for the whole half a billion, it's taking 15hrs and counting. Am I doing something wrong, or am I just being naive about how indexing a database scales with the number of rows?

like image 947
xnx Avatar asked Mar 20 '10 13:03

xnx


People also ask

How long does it take to make an index?

Although it varies, it seems to take as little as 4 days and up to 6 months for a site to be crawled by Google and attribute authority to the domain. When you publish a new blog post, site page, or website in general, there are many factors that determine how quickly it will be indexed by Google.

How long does it take to index SQL?

Yes, it depends on a many factors and there's no way to really know unless you try. I just built an index a single column of type BigInt over 200+ million rows however and it took a hairy 3 minutes. Sure, my hardware is different from yours, my database version is different from yours etc. etc.

Do indexes slow down writes?

Indexes reduce write performance. When a column covered by an index is updated, that index also must be updated. Similarly any deletes or insert requires updating the relevant indexes. The disk space and write penalties of indexes is precisely why you need to be careful about creating indices.

How long does it take to create an index on a large table Oracle?

It takes 3600 seconds to create a index on table X, which has 3 million rows. So the metric is 3600 / 3,000,000 = 0.0012 seconds per row. So if table Y has 8 million rows, you could expect . 0012 * 8,000,000 = 9600 seconds (or 160 minutes) to create the index.


2 Answers

There are a couple of factors to consider:

  • Sorting is a N.log(N) operation.
  • The sort for 14M rows might well fit in main memory; the sort with 500M rows probably doesn't, so the sort spills to disk, which slows things up enormously.

Since the factor is about 30 in size, the nominal sort time for the big data set would be of the order of 50 times as long - under two hours. However, you need 8 bytes per data value and about another 8 bytes of overhead (that's a guess - tune to mySQL if you know more about what it stores in an index). So, 14M × 16 ≈ 220 MB main memory. But 500M × 16 ≈ 8 GB main memory. Unless your machine has that much memory to spare (and MySQL is configured to use it), then the big sort is spilling to disk and that accounts for a lot of the rest of the time.

like image 131
Jonathan Leffler Avatar answered Sep 24 '22 14:09

Jonathan Leffler


Firstly, your table definition could make a big difference here. If you don't need NULL values in your columns, define them NOT NULL. This will save space in the index, and presumably time while creating it.

CREATE TABLE x (    i INTEGER UNSIGNED NOT NULL,    j INTEGER UNSIGNED NOT NULL,    nu DOUBLE NOT NULL,    A DOUBLE NOT NULL  ); 

As for the time taken to create the indexes, this requires a table scan and will show up as REPAIR BY SORTING. It should be quicker in your case (i.e. massive data set) to create a new table with the required indexes and insert the data into it, as this will avoid the REPAIR BY SORTING operation as the indexes are built sequentially on the insert. There is a similar concept explained in this article.

CREATE DATABASE trans_clone; CREATE TABLE trans_clone.trans LIKE originalDB.trans; ALTER TABLE trans_clone.trans ADD KEY idx_A (A); 

Then script the insert into chunks (as per the article), or dump the data using MYSQLDUMP:

mysqldump originalDB trans  --extended-insert --skip-add-drop-table --no-create-db --no-create-info > originalDB .trans.sql mysql trans_clone < originalDB .trans.sql 

This will insert the data, but will not require an index rebuild (the index is built as each row is inserted) and should complete much faster.

like image 20
Andy Avatar answered Sep 24 '22 14:09

Andy