Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing- write the results in the same file

I have a simple function that writes the output of some calculations in a sqlite table. I would like to use this function in parallel using multi-processing in Python. My specific question is how to avoid conflict when each process tries to write its result into the same table? Running the code gives me this error: sqlite3.OperationalError: database is locked.

import sqlite3
from multiprocessing import Pool

conn = sqlite3.connect('test.db')
c = conn.cursor()
c.execute("CREATE TABLE table_1 (id int,output int)")

def write_to_file(a_tuple):
    index = a_tuple[0]
    input = a_tuple[1]
    output = input + 1
    c.execute('INSERT INTO table_1 (id, output)' 'VALUES (?,?)', (index,output))

if __name__ == "__main__":
    p = Pool()
    results = p.map(write_to_file, [(1,10),(2,11),(3,13),(4,14)])
    p.close()
    p.join()

Traceback (most recent call last):
sqlite3.OperationalError: database is locked
like image 545
Behzad Jamali Avatar asked Dec 24 '22 18:12

Behzad Jamali


1 Answers

Using a Pool is a good idea.

I see three possible solutions to this problem.

First, instead of having the pool worker trying to insert data into the database, let the worker return the data to the parent process.

In the parent process, use imap_unordered instead of map. This is an iterable that starts providing values as soon as they become available. The parent can than insert the data into the database.

This will serialize the access to the database, preventing the problem.

This solution would be preferred if the data to be inserted into the database is relatively small, but updates happen very often. So if it takes the same or more time to update the database than to calculate the data.


Second, you could use a Lock. A worker should then

  • acquire the lock,
  • open the database,
  • insert the values,
  • close the database,
  • release the lock.

This will avoid the overhead of sending the data to the parent process. But instead you may have workers stalling waiting to write their data into a database.

This would be a preferred solution if the amount of data to be inserted is large but it takes much longer to calculate the data than to insert it into the database.


Third, you could have each worker write to its own database, and merge them afterwards. You can do this directly in sqlite or even in Python. Although with a large amount of data I'm not sure if the latter has advantages.

like image 163
Roland Smith Avatar answered Jan 29 '23 08:01

Roland Smith