Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

import bulk data into MySQL

So I'm trying to import some sales data into my MySQL database. The data is originally in the form of a raw CSV file, which my PHP application needs to first process, then save the processed sales data to the database.

Initially I was doing individual INSERT queries, which I realized was incredibly inefficient (~6000 queries taking almost 2 minutes). I then generated a single large query and INSERTed the data all at once. That gave us a 3400% increase in efficiency, and reduced the query time to just over 3 seconds.

But as I understand it, LOAD DATA INFILE is supposed to be even quicker than any sort of INSERT query. So now I'm thinking about writing the processed data to a text file and using LOAD DATA INFILE to import it into the database. Is this the optimal way to insert large amounts of data to a database? Or am I going about this entirely the wrong way?

I know a few thousand rows of mostly numeric data isn't a lot in the grand scheme of things, but I'm trying to make this intranet application as quick/responsive as possible. And I also want to make sure that this process scales up in case we decide to license the program to other companies.

UPDATE:

So I did go ahead and test LOAD DATA INFILE out as suggested, thinking it might give me only marginal speed increases (since I was now writing the same data to disk twice), but I was surprised when it cut the query time from over 3300ms down to ~240ms. The page still takes about ~1500ms to execute total, but it's still noticeably better than before.

From here I guess I'll check to see if I have any superfluous indexes in the database, and, since all but two of my tables are InnoDB, I will look into optimizing the InnoDB buffer pool to optimize the overall performance.

like image 846
Lèse majesté Avatar asked Dec 17 '22 23:12

Lèse majesté


2 Answers

LOAD DATA INFILE is very fast, and is the right way to import text files into MySQL. It is one of the recommended methods for speeding up the insertion of data -up to 20 times faster, according to this:

https://dev.mysql.com/doc/refman/8.0/en/insert-optimization.html

Assuming that writing the processed data back to a text file is faster than inserting it into the database, then this is a good way to go.

like image 59
Mike Avatar answered Dec 26 '22 23:12

Mike


LOAD DATA or multiple inserts are going to be much better than single inserts; LOAD DATA saves you a tiny little bit you probably don't care about that much.

In any case, do quite a lot but not too much in one transaction - 10,000 rows per transaction generally feels about right (NB: this is not relevant to non-transactional engines). If your transactions are too small then it will spend all its time syncing the log to disc.

Most of the time doing a big insert is going to come from building indexes, which is an expensive and memory-intensive operation.

If you need performance,

  • Have as few indexes as possible
  • Make sure the table and all its indexes fit in your innodb buffer pool (Assuming innodb here)
  • Just add more ram until your table fits in memory, unless that becomes prohibitively expensive (64G is not too expensive nowadays)

If you must use MyISAM, there are a few dirty tricks there to make it better which I won't discuss further.

like image 43
MarkR Avatar answered Dec 26 '22 21:12

MarkR