Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling very large data with mysql

Sorry for the long post!

I have a database containing ~30 tables (InnoDB engine). Only two of these tables, namely, "transaction" and "shift" are quite large (the first one have 1.5 million rows and shift has 23k rows). Now everything works fine and I don't have problem with the current database size.

However, we will have a similar database (same datatypes, design ,..) but much larger, e.g., the "transaction" table will have about 1 billion records (about 2,3 million transaction per day) and we are thinking about how we should deal with such volume of data in MySQL? (it is both read and write intensive). I read a lot of related posts to see if Mysql (and more specifically InnoDB engine) can perform well with billions of records, but still I have some questions. Some of those related posts that I've read are in the following:

  • Can MySQL reasonably perform queries on billions of rows?
  • Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows?
  • Best data store for billions of rows
  • How big can a MySQL database get before performance starts to degrade
  • Why MySQL could be slow with large tables?
  • Can Mysql handle tables which will hold about 300 million records?

What I've understood so far to improve the performance for very large tables:

  1. (for innoDB tables which is my case) increasing the innodb_buffer_pool_size (e.g., up to 80% of RAM). Also, I found some other MySQL performance tunning settings here in percona blog
  2. having proper indexes on the table (using EXPLAN on queries)
  3. partitioning the table
  4. MySQL Sharding or clustering

Here are my questions/confusions:

  • About partitioning, I have some doubts whether we should use it or not. On one hand many people suggested it to improve performance when table is very large. On the other hand, I've read many posts saying it does not improve query performance and it does not make queries run faster (e.g., here and here). Also, I read in MySQL Reference Manual that InnoDB foreign keys and MySQL partitioning are not compatible (we have foreign keys).

  • Regarding indexes, right now they perform well, but as far as I understood, for very large tables indexing is more restrictive (as Kevin Bedell mentioned in his answer here). Also, indexes speed up reads while slow down write (insert/update). So, for the new similar project that we will have this large DB, should we first insert/load all the data and then create indexes? (to speed up the insert)

  • If we cannot use partitioning for our big table ("transaction" table), what is an alternative option to improve the performance? (except MySQl variable settings such as innodb_buffer_pool_size). Should we use Mysql clusters? (we have also lots of joins)

EDIT

This is the show create table statement for our largest table named "transaction":

  CREATE TABLE `transaction` (  `id` int(11) NOT NULL AUTO_INCREMENT,  `terminal_transaction_id` int(11) NOT NULL,  `fuel_terminal_id` int(11) NOT NULL,  `fuel_terminal_serial` int(11) NOT NULL,  `xboard_id` int(11) NOT NULL,  `gas_station_id` int(11) NOT NULL,  `operator_id` text NOT NULL,  `shift_id` int(11) NOT NULL,  `xboard_total_counter` int(11) NOT NULL,  `fuel_type` int(11) NOT NULL,  `start_fuel_time` int(11) NOT NULL,  `end_fuel_time` int(11) DEFAULT NULL,  `preset_amount` int(11) NOT NULL,  `actual_amount` int(11) DEFAULT NULL,  `fuel_cost` int(11) DEFAULT NULL,  `payment_cost` int(11) DEFAULT NULL,  `purchase_type` int(11) NOT NULL,  `payment_ref_id` text,  `unit_fuel_price` int(11) NOT NULL,  `fuel_status_id` int(11) DEFAULT NULL,  `fuel_mode_id` int(11) NOT NULL,  `payment_result` int(11) NOT NULL,  `card_pan` text,  `state` int(11) DEFAULT NULL,  `totalizer` int(11) NOT NULL DEFAULT '0',  `shift_start_time` int(11) DEFAULT NULL,  PRIMARY KEY (`id`),  UNIQUE KEY `terminal_transaction_id` (`terminal_transaction_id`,`fuel_terminal_id`,`start_fuel_time`) USING BTREE,  KEY `start_fuel_time_idx` (`start_fuel_time`),  KEY `fuel_terminal_idx` (`fuel_terminal_id`),  KEY `xboard_idx` (`xboard_id`),  KEY `gas_station_id` (`gas_station_id`) USING BTREE,  KEY `purchase_type` (`purchase_type`) USING BTREE,  KEY `shift_start_time` (`shift_start_time`) USING BTREE,  KEY `fuel_type` (`fuel_type`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=1665335 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT 

Thanks for your time,

like image 284
mOna Avatar asked Sep 26 '16 10:09

mOna


People also ask

Can MySQL handle large amount of data?

MySQL was not designed for running complicated queries against massive data volumes which requires crunching through a lot of data on a huge scale. MySQL optimizer is quite limited, executing a single query at a time using a single thread.

How does MySQL handle millions of data?

Show activity on this post. As already mentioned, fetching 2.5 mio entries requires loads of memory / cpu power. Try fetching the records in batches. If that's not solving your problem, you should consider finding a better way to not loop through such an amount of records each time.

Can MySQL handle 1 million records?

The MySQL maximum row size limit of 65,535 bytes is demonstrated in the following InnoDB and MyISAM examples. The limit is enforced regardless of storage engine, even though the storage engine may be capable of supporting larger rows.

Can MySQL store large data?

MySql have Unlocked New Big Data Insights with MySQL & Hadoop. Soluation 1: You can use MySQL as a Document Store. There are possible to store many many object as JSON. It highly recommended and Extendable.

How do you handle big data with PHP/MySQL?

What is the best way to handle big data with PHP/MySQL? Develop PHP and Web projects with PhpStorm. Great code assistance, smart debugger, safe refactorings, all major PHP frameworks support. Try for free! It depends on what you need and what you want to store. However, MySQL is not the best choice to big data.

How to handle large volumes of data in MySQL?

The historical (but perfectly valid) approach to handling large volumes of data is to implement partitioning. The idea behind it is to split table into partitions, sort of a sub-tables. The split happens according to the rules defined by the user. Let’s take a look at some of the examples (the SQL examples are taken from MySQL 8.0 documentation)

What are the disadvantages of MySQL?

The lack of a memory-centered search engine can result in high overhead and performance bottlenecks. Handling large data volumes requires techniques such as shading and splitting data over multiple nodes to get around the single-node architecture of MySQL. Processing volatile data can pose a problem in MySQL.

Is MySQL the best choice for big data?

However, MySQL is not the best choice to big data. Choose some NoSQL solutions or special designed database systems for big data like Hadoop. > PHP 7.2 is very fast and you can use it for big data purpose, but don't forget about infrastructure (AWS is most popular solution in this area) > So, as everything,...


Video Answer


1 Answers

  • Can MySQL reasonably perform queries on billions of rows? -- MySQL can 'handle' billions of rows. "Reasonably" depends on the queries; let's see them.

  • Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows? -- 5.7 has some improvements, but 5.5 is pretty good, in spite of being nearly 6 8 years old, and on the verge of no longer being supported.

  • Best data store for billions of rows -- If you mean 'Engine', then InnoDB.

  • How big can a MySQL database get before the performance starts to degrade -- Again, that depends on the queries. I can show you a 1K row table that will meltdown; I have worked with billion-row tables that hum along.

  • Why MySQL could be slow with large tables? -- range scans lead to I/O, which is the slow part.

  • Can Mysql handle tables which will hold about 300 million records? -- again, yes. The limit is somewhere around a trillion rows.

  • (for InnoDB tables which is my case) increasing the innodb_buffer_pool_size (e.g., up to 80% of RAM). Also, I found some other MySQL performance tunning settings here in Percona blog -- yes

  • having proper indexes on the table (using EXPLAIN on queries) -- well, let's see them. There are a lot of mistakes that can be made in this critical area.

  • partitioning the table -- "Partitioning is not a panacea!" I harp on that in my blog

  • MySQL Sharding -- Currently this is DIY

  • MySQL clustering -- Currently the best answer is some Galera-based option (PXC, MariaDB 10, DIY w/Oracle). Oracle's "Group Replication" is a viable contender.

  • Partitioning does not support FOREIGN KEY or "global" UNIQUE.

  • UUIDs, at the scale you are talking about, will not just slow down the system, but actually kill it. Type 1 UUIDs may be a workaround.

  • Insert and index-build speed -- There are too many variations to give a single answer. Let's see your tentative CREATE TABLE and how you intend to feed the data in.

  • Lots of joins -- "Normalize, but don't over-normalize." In particular, do not normalize datetimes or floats or other "continuous" values.

  • Do build summary tables

  • 2,3 million transactions per day -- If that is 2.3M inserts (30/sec), then there is not much of a performance problem. If more complex, then RAID, SSD, batching, etc, may be necessary.

  • deal with such volume of data -- If most activity is with the "recent" rows, then the buffer_pool will nicely 'cache' the activity, thereby avoiding I/O. If the activity is "random", then MySQL (or anyone else) will have I/O issues.

  • Shrinking the datatypes helps in a table like yours. I doubt if you need 4 bytes to specify fuel_type. There are multiple 1-byte approaches.

like image 121
Rick James Avatar answered Sep 21 '22 09:09

Rick James