Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating a massive 150M-row MySQL table

Tags:

I have a C program that mines a huge data source (20GB of raw text) and generates loads of INSERTs to execute on simple blank table (4 integer columns with 1 primary key). Setup as a MEMORY table, the entire task completes in 8 hours. After finishing, about 150 million rows exist in the table. Eight hours is a completely-decent number for me. This is a one-time deal.

The problem comes when trying to convert the MEMORY table back into MyISAM so that (A) I'll have the memory freed up for other processes and (B) the data won't be killed when I restart the computer.

ALTER TABLE memtable ENGINE = MyISAM

I've let this ALTER TABLE query run for over two days now, and it's not done. I've now killed it.

If I create the table initially as MyISAM, the write speed seems terribly poor (especially due to the fact that the query requires the use of the ON DUPLICATE KEY UPDATE technique). I can't temporarily turn off the keys. The table would become over 1000 times larger if I were to and then I'd have to reprocess the keys and essentially run a GROUP BY on 150,000,000,000 rows. Umm, no.

One of the key constraints to realize: The INSERT query UPDATEs records if the primary key (a hash) exists in the table already.

At the very beginning of an attempt at strictly using MyISAM, I'm getting a rough speed of 1,250 rows per second. Once the index grows, I imagine this rate will tank even more.


I have 16GB of memory installed in the machine. What's the best way to generate a massive table that ultimately ends up as an on-disk, indexed MyISAM table?


Clarification: There are many, many UPDATEs going on from the query (INSERT ... ON DUPLICATE KEY UPDATE val=val+whatever). This isn't, by any means, a raw dump problem. My reasoning for trying a MEMORY table in the first place was for speeding-up all the index lookups and table-changes that occur for every INSERT.

like image 819
brianreavis Avatar asked Oct 19 '09 00:10

brianreavis


People also ask

Can MySQL handle 1 million records?

Can MySQL handle 100 million records? Yeah, it can handle billions of records. If you properly index tables, they fit in memory and your queries are written properly then it shouldn't be an issue.

What is the maximum rows in MySQL table?

The MySQL maximum row size limit of 65,535 bytes is demonstrated in the following InnoDB and MyISAM examples. The limit is enforced regardless of storage engine, even though the storage engine may be capable of supporting larger rows.

What is the maximum row size for a table?

MySQL db has a max row limit of 65535 bytes. We have a few tables having wide columns in range of 400-900 columns.


1 Answers

If you intend to make it a MyISAM table, why are you creating it in memory in the first place? If it's only for speed, I think the conversion to a MyISAM table is going to negate any speed improvement you get by creating it in memory to start with.

You say inserting directly into an "on disk" table is too slow (though I'm not sure how you're deciding it is when your current method is taking days), you may be able to turn off or remove the uniqueness constraints and then use a DELETE query later to re-establish uniqueness, then re-enable/add the constraints. I have used this technique when importing into an INNODB table in the past, and found even with the later delete it was overall much faster.

Another option might be to create a CSV file instead of the INSERT statements, and either load it into the table using LOAD DATA INFILE (I believe that is faster then the inserts, but I can't find a reference at present) or by using it directly via the CSV storage engine, depending on your needs.

like image 184
Brenton Alker Avatar answered Oct 05 '22 13:10

Brenton Alker