I have a simple function that writes the output of some calculations in a sqlite table. I would like to use this function in parallel using multi-processing in Python. My specific question is how to avoid conflict when each process tries to write its result into the same table? Running the code gives me this error: sqlite3.OperationalError: database is locked.
import sqlite3
from multiprocessing import Pool
conn = sqlite3.connect('test.db')
c = conn.cursor()
c.execute("CREATE TABLE table_1 (id int,output int)")
def write_to_file(a_tuple):
index = a_tuple[0]
input = a_tuple[1]
output = input + 1
c.execute('INSERT INTO table_1 (id, output)' 'VALUES (?,?)', (index,output))
if __name__ == "__main__":
p = Pool()
results = p.map(write_to_file, [(1,10),(2,11),(3,13),(4,14)])
p.close()
p.join()
Traceback (most recent call last):
sqlite3.OperationalError: database is locked
Using a Pool
is a good idea.
I see three possible solutions to this problem.
First, instead of having the pool worker trying to insert data into the database, let the worker return the data to the parent process.
In the parent process, use imap_unordered
instead of map
.
This is an iterable that starts providing values as soon as they become available. The parent can than insert the data into the database.
This will serialize the access to the database, preventing the problem.
This solution would be preferred if the data to be inserted into the database is relatively small, but updates happen very often. So if it takes the same or more time to update the database than to calculate the data.
Second, you could use a Lock
. A worker should then
This will avoid the overhead of sending the data to the parent process. But instead you may have workers stalling waiting to write their data into a database.
This would be a preferred solution if the amount of data to be inserted is large but it takes much longer to calculate the data than to insert it into the database.
Third, you could have each worker write to its own database, and merge them afterwards. You can do this directly in sqlite or even in Python. Although with a large amount of data I'm not sure if the latter has advantages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With