Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL and a table with 100+ millions of rows

Tags:

mysql

I have a few tables with more than 100+ millions of rows. I get about 20-40 millions of rows each month.

At this moment everything seems fine: - all inserts are fast - all selects are fast ( they are using indexes and don't use complex aggregations )

However, I am worried about two things, what I've read somewhere: - When a table has few hundred millions of rows, there might be slow inserts, because it might take a while to re-balance the indexes ( binary trees ) - If index doesn't fit into memory, it might take a while to read it from the different parts of the disk.

Any comments would be highly appreciated. Any suggestions how can I avoid it or how can I fix/mitigate the problem if/when it happens would be highly appreciated.

( I know we should start doing a sharding at some day )

Thank you in advance.

like image 778
user3489820 Avatar asked Jul 13 '16 08:07

user3489820


People also ask

Can MySQL handle millions of rows?

Row Size Limit ExamplesThe MySQL maximum row size limit of 65,535 bytes is demonstrated in the following InnoDB and MyISAM examples. The limit is enforced regardless of storage engine, even though the storage engine may be capable of supporting larger rows.

Can SQL handle 1 million records?

Millions of rows is not a problem, this is what SQL databases are designed to handle, if you have a well designed schema and good indexes.

How big is too big for a MySQL table?

You can't have more than 1000 columns. Your records can't be bigger than 8k each. These limits change depending on database engine.

How does MySQL handle millions of data?

Show activity on this post. As already mentioned, fetching 2.5 mio entries requires loads of memory / cpu power. Try fetching the records in batches. If that's not solving your problem, you should consider finding a better way to not loop through such an amount of records each time.


1 Answers

Today is the day you should think about sharding or partitioning because if you have 100MM rows today and you're gaining them at ~30MM per month then you're going to double the size of that in three months, and possibly double it again before the year is out.

At some point you'll hit an event horizon where your database is too big to migrate. Either you don't have enough working space left on your disk to switch to an alternate schema, or you don't have enough down-time to perform the migration before it needs to be operational again. Then you're stuck with it forever as it gets slower and slower.

The performance of write activity on a table is largely a function of how difficult the indices are to maintain. The more data you index the more punishing writes can be. The type of index is all relevant, some are more compact than others. If your data is lightly indexed you can usually get away with having more records before things start to get cripplingly slow, but that degradation factor is highly dependent on your system configuration, your hardware, and your IO capacity.

Remember, InnoDB, the engine you should be using, has a lot of tuning parameters and many people leave it set to the really terrible defaults. Have a look at the memory allocated to it and be sure you're doing that properly.

If you have any way of partitioning this data, like by month, by customer, or some other factor that is not going to change based on business logic, that is the data is intrinsically not related, you will have many simple options. If it's not, you'll have to make some hard decisions.

The one thing you want to be doing now is simulating what your table's performance is like with 1G rows in it. Create a sufficiently large, suitably varied amount of test data, then see how well it performs under load. You may find it's not an issue, in which case, no worries for another few years. If not, start panicking today and working towards a solution before your data becomes too big to split.

Database performance generally degrades in a fairly linear fashion, and then at some point it falls off a cliff. You need to know where this cliff is so you know how much time you have before you hit it. The sharp degradation in performance usually comes when your indexes can't fit in memory and when your disk buffers are stretched too thin to be useful.

like image 174
tadman Avatar answered Sep 22 '22 05:09

tadman