Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python+MySQL - Bulk Insert

Tags:

python

mysql

I'm working with the MySQLdb module in Python to interact with a database. I have a situation where there is a very large list (tens of thousands of elements) which I need to insert as rows into a table.

My solution right now is to generate a large INSERT statement as a string and execute it.

Is there a smarter way?

like image 607
Mike Avatar asked Jun 26 '11 03:06

Mike


People also ask

Does MySQL have bulk insert?

Using Bulk Insert Statement in MySQL. The INSERT statement in MySQL also supports the use of VALUES syntax to insert multiple rows as a bulk insert statement. To do this, include multiple lists of column values, each enclosed within parentheses and separated by commas.

What does the Executemany () method do?

executemany() Method. This method prepares a database operation (query or command) and executes it against all parameter sequences or mappings found in the sequence seq_of_params . In Python, a tuple containing a single value must include a comma.


3 Answers

There is a smarter way.

The problem with bulk insertions is that by default autocommit is enabled thus causing each insert statement to be saved to stable store before the next insert can initiate.

As the manual page notes:

By default, MySQL runs with autocommit mode enabled. This means that as soon as you execute a statement that updates (modifies) a table, MySQL stores the update on disk to make it permanent. To disable autocommit mode, use the following statement:

SET autocommit=0; 

After disabling autocommit mode by setting the autocommit variable to zero, changes to transaction-safe tables (such as those for InnoDB, BDB, or NDBCLUSTER) are not made permanent immediately. You must use COMMIT to store your changes to disk or ROLLBACK to ignore the changes.

This is a pretty common feature of RDBMs systems which presume that database integrity is paramount. It does make bulk inserts take on the order of 1s per insert instead of 1ms. The alternative of making an overlarge insert statement tries to achieve this single commit at risk of overloading the SQL parser.

like image 95
msw Avatar answered Oct 22 '22 06:10

msw


As long as you're doing it as a single INSERT and not thousands of individual ones, then yes this is the best way to do it. Watch out for not exceeding mysqls's max packet size, and adjust it if necessary. For example this sets the server packet max to 32Mb. You need to do the same on the client too.

mysqld --max_allowed_packet=32M
like image 1
justinhj Avatar answered Oct 22 '22 07:10

justinhj


If you have to insert very large amount of data why are you trying to insert all of them in one single insert? (This will unecessary put load on your memory in making this large insert string and also while executing it. Also this isn't a very good solution if your data to be inserted is very very large.)

Why don't you put one row per insert command in the db and put all the rows using a for...loop and commit all the changes in the end?

con = mysqldb.connect(
                        host="localhost",
                        user="user",
                        passwd="**",
                        db="db name"
                     )
cur = con.cursor()

for data in your_data_list:
    cur.execute("data you want to insert: %s" %data)

con.commit()
con.close()

(Believe me, this is really fast but if you are getting slower results then it means your autocommit must be True. Set it to False as msw says.)

like image 14
Pushpak Dagade Avatar answered Oct 22 '22 05:10

Pushpak Dagade