Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do a one-time load for 4 billion records from MySQL to SQL Server

We have a need to do the initial data copy on a table that has 4+ billion records to target SQL Server (2014) from source MySQL (5.5). The table in question is pretty wide with 55 columns, however none of them are LOB. I'm looking for options for copying this data in the most efficient way possible.

We've tried loading via Attunity Replicate (which has worked wonderfully for tables not this large) but if the initial data copy with Attunity Replicate fails then it starts over from scratch ... losing whatever time was spent copying the data. With patching and the possibility of this table taking 3+ months to load Attunity wasn't the solution.

We've also tried smaller batch loads with a linked server. This is working but doesn't seem efficient at all.

Once the data is copied we will be using Attunity Replicate to handle CDC.

like image 829
Cody Brumett Avatar asked May 14 '19 22:05

Cody Brumett


People also ask

How do I load a large amount of data into MySQL?

If you’re using MySQL, the fastest way to load a very large number of records into a table is the following: Get the data into CSV format. Using LOAD DATA, load the data either into your table or to a working table if you need to do post-loading SQL operations to map the data into your final table.

How many records does a single MySQL server have?

In my previous company, we had single-instance MySQL servers with 15 billion+ records in single instances, and 3 billion records in single tables. The only “trick” we used was carefully-designed partitioning.

Is 100000 rows a large input file in MySQL?

Note that 100,000 rows is not necessarily a large input file, it depends. dbload - can load data from a delimited or fixed field length flat file into multiple tables with partial commits every N rows. Faster than using dbaccess’s LOAD and avoids the issues mentioned.

How to load multiple files into a single table in SQL?

The basic SQL tool, dbaccess, supplies a LOAD verb that reads flat files in delimited format and can use bulk load techniques to load that data into a single table. The inserts are performed as a single transaction which can cause concurrency problems and may result in a long transaction rollback. Discouraged for larger input files.


1 Answers

For something like this I think SSIS would be the most simple. It's designed for large inserts as big as 1TB. In fact, I'd recommend this MSDN article We loaded 1TB in 30 Minutes and so can you.

Doing simple things like dropping indexes and performing other optimizations like partitioning would make your load faster. While 30 minutes isn't a feasible time to shoot for, it would be a very straightforward task to have an SSIS package run outside of business hours.

My business doesn't have a load on the scale you do, but we do refresh our databases of more than 100M nightly which doesn't take more than 45 minutes, even with it being poorly optimized.

like image 55
PausePause Avatar answered Oct 16 '22 03:10

PausePause