MySQL and a table with 100+ millions of rows

Tags:

mysql

I have a few tables with more than 100+ millions of rows. I get about 20-40 millions of rows each month.

At this moment everything seems fine: - all inserts are fast - all selects are fast ( they are using indexes and don't use complex aggregations )

However, I am worried about two things, what I've read somewhere: - When a table has few hundred millions of rows, there might be slow inserts, because it might take a while to re-balance the indexes ( binary trees ) - If index doesn't fit into memory, it might take a while to read it from the different parts of the disk.

Any comments would be highly appreciated. Any suggestions how can I avoid it or how can I fix/mitigate the problem if/when it happens would be highly appreciated.

( I know we should start doing a sharding at some day )

Thank you in advance.

778

asked Jul 13 '16 08:07

user3489820

1 Answers

Today is the day you should think about sharding or partitioning because if you have 100MM rows today and you're gaining them at ~30MM per month then you're going to double the size of that in three months, and possibly double it again before the year is out.

At some point you'll hit an event horizon where your database is too big to migrate. Either you don't have enough working space left on your disk to switch to an alternate schema, or you don't have enough down-time to perform the migration before it needs to be operational again. Then you're stuck with it forever as it gets slower and slower.

The performance of write activity on a table is largely a function of how difficult the indices are to maintain. The more data you index the more punishing writes can be. The type of index is all relevant, some are more compact than others. If your data is lightly indexed you can usually get away with having more records before things start to get cripplingly slow, but that degradation factor is highly dependent on your system configuration, your hardware, and your IO capacity.

Remember, InnoDB, the engine you should be using, has a lot of tuning parameters and many people leave it set to the really terrible defaults. Have a look at the memory allocated to it and be sure you're doing that properly.

If you have any way of partitioning this data, like by month, by customer, or some other factor that is not going to change based on business logic, that is the data is intrinsically not related, you will have many simple options. If it's not, you'll have to make some hard decisions.

The one thing you want to be doing now is simulating what your table's performance is like with 1G rows in it. Create a sufficiently large, suitably varied amount of test data, then see how well it performs under load. You may find it's not an issue, in which case, no worries for another few years. If not, start panicking today and working towards a solution before your data becomes too big to split.

Database performance generally degrades in a fairly linear fashion, and then at some point it falls off a cliff. You need to know where this cliff is so you know how much time you have before you hit it. The sharp degradation in performance usually comes when your indexes can't fit in memory and when your disk buffers are stretched too thin to be useful.

174

answered Sep 22 '22 05:09

tadman

Related questions
                            
                                How does MySQL's ORDER BY RAND() work?
                            
                                MySQL: SELECT from another server
                            
                                Difference between Encoding and collation?
                            
                                What does information_schema.TABLES.DATA_FREE mean in MySQL?
                            
                                Error: upstream prematurely closed connection while reading response header from upstream [uWSGI/Django/NGINX]
                            
                                Best way to archive live MySQL database
                            
                                Why not specify every VARCHAR as VARCHAR (65535)?
                            
                                Is it OK to name a MySQL index the same as the column it indexes?
                            
                                No matter what, I can't batch MySQL INSERT statements in Hibernate
                            
                                Convert a MySQL database from latin to UTF-8
                            
                                How to scale MySQL with multiple machines?
                            
                                How big is too big for a MySQL table?
                            
                                Creating temporary tables in MySQL Stored Procedure
                            
                                What is the significance of the index name when creating an index in MySQL?
                            
                                Single Django model, multiple tables?
                            
                                How can I start and check my MySQL log?
                            
                                MySQL TINYINT as unsigned
                            
                                Exporting Blob from MySQL database to file with only SQL
                            
                                Mysql localhost != 127.0.0.1?
                            
                                REPLACE versus INSERT in SQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With