I'm working with the MySQLdb module in Python to interact with a database. I have a situation where there is a very large list (tens of thousands of elements) which I need to insert as rows into a table.
My solution right now is to generate a large INSERT
statement as a string and execute it.
Is there a smarter way?
Using Bulk Insert Statement in MySQL. The INSERT statement in MySQL also supports the use of VALUES syntax to insert multiple rows as a bulk insert statement. To do this, include multiple lists of column values, each enclosed within parentheses and separated by commas.
executemany() Method. This method prepares a database operation (query or command) and executes it against all parameter sequences or mappings found in the sequence seq_of_params . In Python, a tuple containing a single value must include a comma.
There is a smarter way.
The problem with bulk insertions is that by default autocommit is enabled thus causing each insert
statement to be saved to stable store before the next insert can initiate.
As the manual page notes:
By default, MySQL runs with autocommit mode enabled. This means that as soon as you execute a statement that updates (modifies) a table, MySQL stores the update on disk to make it permanent. To disable autocommit mode, use the following statement:
SET autocommit=0;
After disabling autocommit mode by setting the autocommit variable to zero, changes to transaction-safe tables (such as those for InnoDB, BDB, or NDBCLUSTER) are not made permanent immediately. You must use COMMIT to store your changes to disk or ROLLBACK to ignore the changes.
This is a pretty common feature of RDBMs systems which presume that database integrity is paramount. It does make bulk inserts take on the order of 1s per insert instead of 1ms. The alternative of making an overlarge insert statement tries to achieve this single commit at risk of overloading the SQL parser.
As long as you're doing it as a single INSERT and not thousands of individual ones, then yes this is the best way to do it. Watch out for not exceeding mysqls's max packet size, and adjust it if necessary. For example this sets the server packet max to 32Mb. You need to do the same on the client too.
mysqld --max_allowed_packet=32M
If you have to insert very large amount of data why are you trying to insert all of them in one single insert
? (This will unecessary put load on your memory in making this large insert
string and also while executing it. Also this isn't a very good solution if your data to be inserted is very very large.)
Why don't you put one row per insert
command in the db and put all the rows using a for...loop
and commit all the changes in the end?
con = mysqldb.connect(
host="localhost",
user="user",
passwd="**",
db="db name"
)
cur = con.cursor()
for data in your_data_list:
cur.execute("data you want to insert: %s" %data)
con.commit()
con.close()
(Believe me, this is really fast but if you are getting slower results then it means your autocommit
must be True
. Set it to False
as msw
says.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With