Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Table with 80 million records and adding an index takes more than 18 hours (or forever)! Now what?

A short recap of what happened. I am working with 71 million records (not much compared to billions of records processed by others). On a different thread, someone suggested that the current setup of my cluster is not suitable for my need. My table structure is:

CREATE TABLE `IPAddresses` (   `id` int(11) unsigned NOT NULL auto_increment,   `ipaddress` bigint(20) unsigned default NULL,   PRIMARY KEY  (`id`) ) ENGINE=MyISAM; 

And I added the 71 million records and then did a:

ALTER TABLE IPAddresses ADD INDEX(ipaddress); 

It's been 14 hours and the operation is still not completed. Upon Googling, I found that there is a well-known approach for solving this problem - Partitioning. I understand that I need to partition my table now based on the ipaddress but can I do this without recreating the entire table? I mean, through an ALTER statement? If yes, there was one requirement saying that the column to be partitioned on should be a primary key. I will be using the id of this ipaddress in constructing a different table so ipaddress is not my primary key. How do I partition my table given this scenario?

like image 488
Legend Avatar asked Sep 12 '10 17:09

Legend


People also ask

Can MySQL handle 1 million records?

The MySQL maximum row size limit of 65,535 bytes is demonstrated in the following InnoDB and MyISAM examples. The limit is enforced regardless of storage engine, even though the storage engine may be capable of supporting larger rows.

How long does it take to create an index on a large table MySQL?

The table is partitioned into 40 pieces on column text . Then creating index on the table takes about 1 hours to complete.

What is the maximum number of indexes on my table?

A table can contain a maximum of 1017 columns. Virtual generated columns are included in this limit. A table can contain a maximum of 64 secondary indexes. The index key prefix length limit is 3072 bytes for InnoDB tables that use DYNAMIC or COMPRESSED row format.

How many number of index can be used per table?

Index usage information. SQL Server allows us to create up to 999 Non-clustered indexes and one Clustered indexes per each table.


Video Answer


2 Answers

Ok turns out that this problem was more than just a simple create a table, index it and forget problem :) Here's what I did just in case someone else faces the same problem (I have used an example of IP Address but it works for other data types too):

Problem: Your table has millions of entries and you need to add an index really fast

Usecase: Consider storing millions of IP addresses in a lookup table. Adding the IP addresses should not be a big problem but creating an index on them takes more than 14 hours.

Solution: Partition your table using MySQL's Partitioning strategy

Case #1: When the table you want is not yet created

CREATE TABLE IPADDRESSES(   id INT UNSIGNED NOT NULL AUTO_INCREMENT,   ipaddress BIGINT UNSIGNED,   PRIMARY KEY(id, ipaddress) ) ENGINE=MYISAM PARTITION BY HASH(ipaddress) PARTITIONS 20; 

Case #2: When the table you want is already created. There seems to be a way to use ALTER TABLE to do this but I have not yet figured out a proper solution for this. Instead, there is a slightly inefficient solution:

CREATE TABLE IPADDRESSES_TEMP(   id INT UNSIGNED NOT NULL AUTO_INCREMENT,   ipaddress BIGINT UNSIGNED,   PRIMARY KEY(id) ) ENGINE=MYISAM; 

Insert your IP addresses into this table. And then create the actual table with partitions:

CREATE TABLE IPADDRESSES(   id INT UNSIGNED NOT NULL AUTO_INCREMENT,   ipaddress BIGINT UNSIGNED,   PRIMARY KEY(id, ipaddress) ) ENGINE=MYISAM PARTITION BY HASH(ipaddress) PARTITIONS 20; 

And then finally

INSERT INTO IPADDRESSES(ipaddress) SELECT ipaddress FROM IPADDRESSES_TEMP; DROP TABLE IPADDRESSES_TEMP; ALTER TABLE IPADDRESSES ADD INDEX(ipaddress) 

And there you go... indexing on the new table took me about 2 hours on a 3.2GHz machine with 1GB RAM :) Hope this helps.

like image 173
Legend Avatar answered Sep 21 '22 04:09

Legend


Creating indexes with MySQL is slow, but not that slow. With 71 million records, it should take a couple minutes, not 14 hours. Possible problems are :

  • you have not configured sort buffer sizes and other configuration options

look here : http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_myisam_sort_buffer_size

If you try to generate a 1GB index with a 8MB sort buffer it's going to take lots of passes. But if the buffer is larger than your CPU cache it will get slower. So you have to test and see what works best.

  • someone has a lock on the table
  • your IO system sucks
  • your server is swapping
  • etc

as usual check iostat, vmstat, logs, etc. Issue a LOCK TABLE on your table to check if someone has a lock on it.

FYI on my 64-bit desktop creating an index on 10M random BIGINTs takes 17s...

like image 35
bobflux Avatar answered Sep 25 '22 04:09

bobflux