Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large primary key: 1+ billion rows MySQL + InnoDB?

I was wondering if InnoDB would be the best way to format the table? The table contains one field, primary key, and the table will get 816k rows a day (est.). This will get very large very quick! I'm working on a file storage way (would this be faster)? The table is going to store ID numbers of Twitter Ids that have already been processed?

Also, any estimated memory usage on a SELECT min('id') statement? Any other ideas are greatly appreciated!

like image 670
James Hartig Avatar asked Dec 13 '08 16:12

James Hartig


2 Answers

I'd recommend you start partioning your table by ID or date. Partioning splits a large table into several smaller table according to some defined logic (like splitting it by date ranges), which makes them much more managable performance and memory wise. MySQL 5.1 has this feature built-in, or you can implement it using custom solutions.

In implement storage in a flat-file, you lose all the advantages of a database - you can no longer perform queries involving the data.

like image 192
Eran Galperin Avatar answered Oct 16 '22 00:10

Eran Galperin


The only definitive answer is to try both and test and see what happens.

Generally, MyISAM is faster for writes and reads, but not both at the same time. When you write to a MyISAM table the entire table gets locked for the insert to complete. InnoDB has more overhead but uses row-level locking so that reads and writes can happen concurrently without the problems that MyISAM's table locking incurs.

However, your problem, if I understand it correctly, is a little different. Having only one column, that column being a primary key has an important consideration in the different ways that MyISAM and InnoDB handle primary key indexes.

In MyISAM, the primary key index is just like any other secondary index. Internally each row has a row id and the index nodes just point to the row ids of the data pages. A primary key index is not handled differently than any other index.

In InnoDB, however, primary keys are clustered, meaning they stay attached to the data pages and ensure that the row contents remain in physically sorted order on disk according to the primary key (but only within single data pages, which themselves could be scattered in any order.)

This being the case, I would expect that InnoDB might have an advantage in that MyISAM would essentially have to do double work -- write the integer once in the data pages, and then write it again in the index pages. InnoDB wouldn't do this, the primary key index would be identical to the data pages, and would only have to write once. It would only have to manage the data in one place, where MyISAM would needlessly have to manage two copies.

For either storage engine, doing something like min() or max() should be trivial on an indexed column, or just checking the existence of a number in the index. Since the table is only one column no bookmark lookups would even be necessary as the data would be represented entirely within the index itself. This should be a very efficient index.

I also wouldn't be all that worried about the size of the table. Where the width of a row is only one integer, you can fit a huge number of rows per index/data page.

like image 27
ʞɔıu Avatar answered Oct 15 '22 23:10

ʞɔıu