Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows?

Tags:

mysql

storage

So, one of my tables in MySQL which uses the InnoDB storage engine will contain multi-billion rows(with potentially no limit to how many will be inserted).

Can you tell me what sort of optimizations i can do to help speed up things? Cause with a few million rows already, it will start getting slow.

Of course if you suggest to use something else. The only options i have are PostgreSQL and Sqlite3. But I've been told that sqlite3 is not a good choice for that. As for postgresql, i have absolutely no idea how it is, as i've never used it.

I imagine though, at least about 1000-1500 inserts per second in that table.

like image 624
dikidera Avatar asked May 25 '11 08:05

dikidera


People also ask

Can MySQL handle billions of records?

Yeah, it can handle billions of records. If you properly index tables, they fit in memory and your queries are written properly then it shouldn't be an issue.

Can MySQL handle large tables?

Today's largest MySQL installations cannot scale by using MySQL as a single system and must rely on sharding — or splitting a data set over multiple nodes or instances. However, most sharding solutions in MySQL are manual and make application code more complex.


2 Answers

A simple answer to your question would be yes InnoDB would be the perfect choice for a multi-billion row data set.

There is a host of optimization that is possbile.

The most obvious optimizations would be setting a large buffer pool, as buffer pool is the single most important thing when it comes to InnoDB because InnoDB buffers the data as well as the index in the buffer pool. If you have a dedicated MySQL server with only InnoDB tables, then you should set upto 80% of the available RAM to be used by InnoDB.

Another most important optimization is having proper indexes on the table (keeping in mind the data access/update pattern), both primary and secondary. (Remember that primary indexes are automatically appended to secondary indexes).

With InnoDB there are some extra goodies, such as protection from data corruption, auto-recovery etc.

As for increasing write-performance, you should setup your transaction log files to be upto a total of 4G.

One other thing that you can do is partition the table.

You can eek out more performance, by setting the bin-log-format to "row", and setting the auto_inc_lock_mode to 2 (that will ensure that innodb does not hold table level locks when inserting into auto-increment columns).

If you need any specific advice you can contact me, I would be more than willing to help.

like image 112
ovais.tariq Avatar answered Nov 15 '22 15:11

ovais.tariq


optimizations

  • Take care not to have too many indexes. They are expensive when inserting
  • Make your datatypes fit your data, as tight fit you can. (so don't go saving ip-adresses in a text or a blob, if you know what i mean). Look in to varchar vs char. Don't forget that because varchar is more flexible, you are trading in some things. If you know a lot about your data it might help to use char's, or it might be clearly better to use varchars. etc.
  • Do you read at all from this table? If so, you might want to do all the reading from a replicated slave, although your connection should be good enough for that amount of data.
  • If you have big inserts (aside from the number of inserts), make sure your IO is actually quick enough to handle the load.
  • I don't think there is any reason MySQL wouldn't support this. Things that can slow you down from "thousands" to "millions" to "billions" are stuff like aforementioned indexes. There is -as far as i know- no "mysql is full" problem.
  • Look into Partial indexes. From wikipedia (quickest source I could find, didn't check the references, but I'm sure you can manage:)

MySQL as of version 5.4 does not support partial indexes.[3] In MySQL, the term "partial index" is sometimes used to refer to prefix indexes, where only a truncated prefix of each value is stored in the index. This is another technique for reducing index size.[4]

like image 23
Nanne Avatar answered Nov 15 '22 16:11

Nanne