Python multiprocessing- write the results in the same file

Question

I have a simple function that writes the output of some calculations in a sqlite table. I would like to use this function in parallel using multi-processing in Python. My specific question is how to avoid conflict when each process tries to write its result into the same table? Running the code gives me this error: sqlite3.OperationalError: database is locked.

import sqlite3
from multiprocessing import Pool

conn = sqlite3.connect('test.db')
c = conn.cursor()
c.execute("CREATE TABLE table_1 (id int,output int)")

def write_to_file(a_tuple):
    index = a_tuple[0]
    input = a_tuple[1]
    output = input + 1
    c.execute('INSERT INTO table_1 (id, output)' 'VALUES (?,?)', (index,output))

if __name__ == "__main__":
    p = Pool()
    results = p.map(write_to_file, [(1,10),(2,11),(3,13),(4,14)])
    p.close()
    p.join()

Traceback (most recent call last):
sqlite3.OperationalError: database is locked

Roland Smith · Accepted Answer

Using a Pool is a good idea.

I see three possible solutions to this problem.

First, instead of having the pool worker trying to insert data into the database, let the worker return the data to the parent process.

In the parent process, use imap_unordered instead of map. This is an iterable that starts providing values as soon as they become available. The parent can than insert the data into the database.

This will serialize the access to the database, preventing the problem.

This solution would be preferred if the data to be inserted into the database is relatively small, but updates happen very often. So if it takes the same or more time to update the database than to calculate the data.

Second, you could use a Lock. A worker should then

acquire the lock,
open the database,
insert the values,
close the database,
release the lock.

This will avoid the overhead of sending the data to the parent process. But instead you may have workers stalling waiting to write their data into a database.

This would be a preferred solution if the amount of data to be inserted is large but it takes much longer to calculate the data than to insert it into the database.

Third, you could have each worker write to its own database, and merge them afterwards. You can do this directly in sqlite or even in Python. Although with a large amount of data I'm not sure if the latter has advantages.

Python multiprocessing- write the results in the same file

Tags:

python

sqlite

multiprocessing

Behzad Jamali

1 Answers

Roland Smith

Recent Activity

Donate For Us

Python multiprocessing- write the results in the same file

Tags:

python

sqlite

multiprocessing

Behzad Jamali

1 Answers

Roland Smith

Related questions

Recent Activity

Donate For Us