Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Psycopg2, Postgresql, Python: Fastest way to bulk-insert

I'm looking for the most efficient way to bulk-insert some millions of tuples into a database. I'm using Python, PostgreSQL and psycopg2.

I have created a long list of tulpes that should be inserted to the database, sometimes with modifiers like geometric Simplify.

The naive way to do it would be string-formatting a list of INSERT statements, but there are three other methods I've read about:

  1. Using pyformat binding style for parametric insertion
  2. Using executemany on the list of tuples, and
  3. Using writing the results to a file and using COPY.

It seems that the first way is the most efficient, but I would appreciate your insights and code snippets telling me how to do it right.

like image 722
Adam Matan Avatar asked Feb 16 '10 09:02

Adam Matan


People also ask

How do I run multiple inserts in PostgreSQL?

PostgreSQL INSERT Multiple Rows First, specify the name of the table that you want to insert data after the INSERT INTO keywords. Second, list the required columns or all columns of the table in parentheses that follow the table name. Third, supply a comma-separated list of rows after the VALUES keyword.

How do you insert multiple records in Python?

What if you want to insert multiple rows into a table in a single insert query from the Python application. Use the cursor's executemany() function to insert multiple records into a table. Syntax of the executemany() method.

Is psycopg2 asynchronous?

Psycopg allows asynchronous interaction with other database sessions using the facilities offered by PostgreSQL commands LISTEN and NOTIFY. Please refer to the PostgreSQL documentation for examples about how to use this form of communication.

Does psycopg2 work with Python 3?

The current psycopg2 implementation supports: Python 2 versions from 2.6 to 2.7. Python 3 versions from 3.2 to 3.6.


2 Answers

Yeah, I would vote for COPY, providing you can write a file to the server's hard drive (not the drive the app is running on) as COPY will only read off the server.

like image 88
Andy Shellam Avatar answered Sep 23 '22 06:09

Andy Shellam


There is a new psycopg2 manual containing examples for all the options.

The COPY option is the most efficient. Then the executemany. Then the execute with pyformat.

like image 27
piro Avatar answered Sep 22 '22 06:09

piro