So I'm trying to import some sales data into my MySQL database. The data is originally in the form of a raw CSV file, which my PHP application needs to first process, then save the processed sales data to the database.
Initially I was doing individual INSERT
queries, which I realized was incredibly inefficient (~6000 queries taking almost 2 minutes). I then generated a single large query and INSERT
ed the data all at once. That gave us a 3400% increase in efficiency, and reduced the query time to just over 3 seconds.
But as I understand it, LOAD DATA INFILE
is supposed to be even quicker than any sort of INSERT
query. So now I'm thinking about writing the processed data to a text file and using LOAD DATA INFILE
to import it into the database. Is this the optimal way to insert large amounts of data to a database? Or am I going about this entirely the wrong way?
I know a few thousand rows of mostly numeric data isn't a lot in the grand scheme of things, but I'm trying to make this intranet application as quick/responsive as possible. And I also want to make sure that this process scales up in case we decide to license the program to other companies.
UPDATE:
So I did go ahead and test LOAD DATA INFILE
out as suggested, thinking it might give me only marginal speed increases (since I was now writing the same data to disk twice), but I was surprised when it cut the query time from over 3300ms down to ~240ms. The page still takes about ~1500ms to execute total, but it's still noticeably better than before.
From here I guess I'll check to see if I have any superfluous indexes in the database, and, since all but two of my tables are InnoDB, I will look into optimizing the InnoDB buffer pool to optimize the overall performance.
LOAD DATA INFILE
is very fast, and is the right way to import text files into MySQL. It is one of the recommended methods for speeding up the insertion of data -up to 20 times faster, according to this:
https://dev.mysql.com/doc/refman/8.0/en/insert-optimization.html
Assuming that writing the processed data back to a text file is faster than inserting it into the database, then this is a good way to go.
LOAD DATA or multiple inserts are going to be much better than single inserts; LOAD DATA saves you a tiny little bit you probably don't care about that much.
In any case, do quite a lot but not too much in one transaction - 10,000 rows per transaction generally feels about right (NB: this is not relevant to non-transactional engines). If your transactions are too small then it will spend all its time syncing the log to disc.
Most of the time doing a big insert is going to come from building indexes, which is an expensive and memory-intensive operation.
If you need performance,
If you must use MyISAM, there are a few dirty tricks there to make it better which I won't discuss further.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With