I'm looking for the most efficient way to bulk-insert some millions of tuples into a database. I'm using Python, PostgreSQL and psycopg2.
I have created a long list of tulpes that should be inserted to the database, sometimes with modifiers like geometric Simplify
.
The naive way to do it would be string-formatting a list of INSERT
statements, but there are three other methods I've read about:
pyformat
binding style for parametric insertionexecutemany
on the list of tuples, and COPY
.It seems that the first way is the most efficient, but I would appreciate your insights and code snippets telling me how to do it right.
PostgreSQL INSERT Multiple Rows First, specify the name of the table that you want to insert data after the INSERT INTO keywords. Second, list the required columns or all columns of the table in parentheses that follow the table name. Third, supply a comma-separated list of rows after the VALUES keyword.
What if you want to insert multiple rows into a table in a single insert query from the Python application. Use the cursor's executemany() function to insert multiple records into a table. Syntax of the executemany() method.
Psycopg allows asynchronous interaction with other database sessions using the facilities offered by PostgreSQL commands LISTEN and NOTIFY. Please refer to the PostgreSQL documentation for examples about how to use this form of communication.
The current psycopg2 implementation supports: Python 2 versions from 2.6 to 2.7. Python 3 versions from 3.2 to 3.6.
Yeah, I would vote for COPY, providing you can write a file to the server's hard drive (not the drive the app is running on) as COPY will only read off the server.
There is a new psycopg2 manual containing examples for all the options.
The COPY option is the most efficient. Then the executemany. Then the execute with pyformat.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With