Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk insert huge data into SQLite using Python

Tags:

python

sqlite

I read this: Importing a CSV file into a sqlite3 database table using Python

and it seems that everyone suggests using line-by-line reading instead of using bulk .import from SQLite. However, that will make the insertion really slow if you have millions of rows of data. Is there any other way to circumvent this?

Update: I tried the following code to insert line by line but the speed is not as good as I expected. Is there anyway to improve it

for logFileName in allLogFilesName:     logFile = codecs.open(logFileName, 'rb', encoding='utf-8')     for logLine in logFile:         logLineAsList = logLine.split('\t')         output.execute('''INSERT INTO log VALUES(?, ?, ?, ?)''', logLineAsList)     logFile.close() connection.commit() connection.close() 
like image 227
Shar Avatar asked Aug 13 '13 21:08

Shar


People also ask

How do I populate a SQLite database in Python?

First, connect to the SQLite database by creating a Connection object. Second, create a Cursor object by calling the cursor method of the Connection object. Third, execute an INSERT statement. If you want to pass arguments to the INSERT statement, you use the question mark (?) as the placeholder for each argument.

How many records can SQLite hold?

Maximum Database Size The maximum size of a database file is 4294967294 pages. At the maximum page size of 65536 bytes, this translates into a maximum database size of approximately 1.4e+14 bytes (281 terabytes, or 256 tebibytes, or 281474 gigabytes or 256,000 gibibytes).


2 Answers

Since this is the top result on a Google search I thought it might be nice to update this question.

From the python sqlite docs you can use

import sqlite3  persons = [     ("Hugo", "Boss"),     ("Calvin", "Klein") ]  con = sqlite3.connect(":memory:")  # Create the table con.execute("create table person(firstname, lastname)")  # Fill the table con.executemany("insert into person(firstname, lastname) values (?,?)", persons) 

I have used this method to commit over 50k row inserts at a time and it's lightning fast.

like image 90
Fred Avatar answered Sep 22 '22 04:09

Fred


Divide your data into chunks on the fly using generator expressions, make inserts inside the transaction. Here's a quote from sqlite optimization FAQ:

Unless already in a transaction, each SQL statement has a new transaction started for it. This is very expensive, since it requires reopening, writing to, and closing the journal file for each statement. This can be avoided by wrapping sequences of SQL statements with BEGIN TRANSACTION; and END TRANSACTION; statements. This speedup is also obtained for statements which don't alter the database.

Here's how your code may look like.

Also, sqlite has an ability to import CSV files.

like image 26
alecxe Avatar answered Sep 22 '22 04:09

alecxe