Can anyone tell me how adding a key scales in MySQL? I have 500,000,000 rows in a database, trans, with columns i (INT UNSIGNED), j (INT UNSIGNED), nu (DOUBLE), A (DOUBLE). I try to index a column, e.g.
ALTER TABLE trans ADD KEY idx_A (A);
and I wait. For a table of 14,000,000 rows it took about 2 minutes to execute on my MacBook Pro, but for the whole half a billion, it's taking 15hrs and counting. Am I doing something wrong, or am I just being naive about how indexing a database scales with the number of rows?
Although it varies, it seems to take as little as 4 days and up to 6 months for a site to be crawled by Google and attribute authority to the domain. When you publish a new blog post, site page, or website in general, there are many factors that determine how quickly it will be indexed by Google.
Yes, it depends on a many factors and there's no way to really know unless you try. I just built an index a single column of type BigInt over 200+ million rows however and it took a hairy 3 minutes. Sure, my hardware is different from yours, my database version is different from yours etc. etc.
Indexes reduce write performance. When a column covered by an index is updated, that index also must be updated. Similarly any deletes or insert requires updating the relevant indexes. The disk space and write penalties of indexes is precisely why you need to be careful about creating indices.
It takes 3600 seconds to create a index on table X, which has 3 million rows. So the metric is 3600 / 3,000,000 = 0.0012 seconds per row. So if table Y has 8 million rows, you could expect . 0012 * 8,000,000 = 9600 seconds (or 160 minutes) to create the index.
There are a couple of factors to consider:
Since the factor is about 30 in size, the nominal sort time for the big data set would be of the order of 50 times as long - under two hours. However, you need 8 bytes per data value and about another 8 bytes of overhead (that's a guess - tune to mySQL if you know more about what it stores in an index). So, 14M × 16 ≈ 220 MB main memory. But 500M × 16 ≈ 8 GB main memory. Unless your machine has that much memory to spare (and MySQL is configured to use it), then the big sort is spilling to disk and that accounts for a lot of the rest of the time.
Firstly, your table definition could make a big difference here. If you don't need NULL
values in your columns, define them NOT NULL
. This will save space in the index, and presumably time while creating it.
CREATE TABLE x ( i INTEGER UNSIGNED NOT NULL, j INTEGER UNSIGNED NOT NULL, nu DOUBLE NOT NULL, A DOUBLE NOT NULL );
As for the time taken to create the indexes, this requires a table scan and will show up as REPAIR BY SORTING
. It should be quicker in your case (i.e. massive data set) to create a new table with the required indexes and insert the data into it, as this will avoid the REPAIR BY SORTING
operation as the indexes are built sequentially on the insert. There is a similar concept explained in this article.
CREATE DATABASE trans_clone; CREATE TABLE trans_clone.trans LIKE originalDB.trans; ALTER TABLE trans_clone.trans ADD KEY idx_A (A);
Then script the insert into chunks (as per the article), or dump the data using MYSQLDUMP
:
mysqldump originalDB trans --extended-insert --skip-add-drop-table --no-create-db --no-create-info > originalDB .trans.sql mysql trans_clone < originalDB .trans.sql
This will insert the data, but will not require an index rebuild (the index is built as each row is inserted) and should complete much faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With