I need to efficiently insert about 500k (give or take 100k) rows of data into my PostgreSQL database. After a generous amount of google-ing, I've gotten to this solution, averaging about 150 (wall-clock) seconds.
def db_insert_spectrum(curs, visual_data, recording_id):
sql = """
INSERT INTO spectrums (row, col, value, recording_id)
VALUES %s
"""
# Mass-insertion technique
# visual_data is a 2D array (a nx63 matrix)
values_list = []
for rowIndex, rowData in enumerate(visual_data):
for colIndex, colData in enumerate(rowData): # colData is the value
value = [(rowIndex, colIndex, colData, recording_id)]
values_list.append(value)
psycopg2.extras.execute_batch(curs, sql, values_list, page_size=1000)
Is there a faster way?
PostgreSQL INSERT Multiple Rows First, specify the name of the table that you want to insert data after the INSERT INTO keywords. Second, list the required columns or all columns of the table in parentheses that follow the table name. Third, supply a comma-separated list of rows after the VALUES keyword.
Due to its C implementation, Psycopg2 is very fast and efficient. You can use Psycopg2 to fetch one or more rows from the database based on a SQL query. If you want to insert some data into the database, that's also possible with this library — with multiple options for single or batch inserting.
Psycopg allows asynchronous interaction with other database sessions using the facilities offered by PostgreSQL commands LISTEN and NOTIFY .
mogrify() method returns a query string once the parameters have been bound. If you used the execute() method or anything similar, the string returned is the same as what would be sent to the database. The resulting string is always a bytes string, which is quicker than using the executemany() function. Syntax: cur.
Based on the answers given here, COPY is the fastest method. COPY
reads from a file or file-like object.
Since memory I/O is many orders of magnitude faster than disk I/O, it is faster to write the data to a StringIO
file-like object than to write to an actual file.
The psycopg docs show an example of calling copy_from
with a StringIO
as input.
Therefore, you could use something like:
try:
# Python2
from cStringIO import StringIO
except ImportError:
# Python3
from io import StringIO
def db_insert_spectrum(curs, visual_data, recording_id):
f = StringIO()
# visual_data is a 2D array (a nx63 matrix)
values_list = []
for rowIndex, rowData in enumerate(visual_data):
items = []
for colIndex, colData in enumerate(rowData):
value = (rowIndex, colIndex, colData, recording_id)
items.append('\t'.join(map(str, value))+'\n')
f.writelines(items)
f.seek(0)
cur.copy_from(f, 'spectrums', columns=('row', 'col', 'value', 'recording_id'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With