SQLite3 and Multiprocessing

Tags:

I noticed that sqlite3 isn´t really capable nor reliable when i use it inside a multiprocessing enviroment. Each process tries to write some data into the same database, so that a connection is used by multiple threads. I tried it with the check_same_thread=False option, but the number of insertions is pretty random: Sometimes it includes everything, sometimes not. Should I parallel-process only parts of the function (fetching data from the web), stack their outputs into a list and put them into the table all together or is there a reliable way to handle multi-connections with sqlite?

726

asked Aug 07 '11 00:08

dorvak

3 Answers

First of all, there's a difference between multiprocessing (multiple processes) and multithreading (multiple threads within one process).

It seems that you're talking about multithreading here. There are a couple of caveats that you should be aware of when using SQLite in a multithreaded environment. The SQLite documentation mentions the following:

Do not use the same database connection at the same time in more than one thread.

On some operating systems, a database connection should always be used in the same thread in which it was originally created.

See here for a more detailed information: Is SQLite thread-safe?

answered Oct 10 '22 18:10

omz

sqlitedict: A lightweight wrapper around Python's sqlite3 database, with a dict-like interface and multi-thread access support.

answered Oct 10 '22 18:10

Radim

I've actually just been working on something very similar:

multiple processes (for me a processing pool of 4 to 32 workers)
each process worker does some stuff that includes getting information from the web (a call to the Alchemy API for mine)
each process opens its own sqlite3 connection, all to a single file, and each process adds one entry before getting the next task off the stack

At first I thought I was seeing the same issue as you, then I traced it to overlapping and conflicting issues with retrieving the information from the web. Since I was right there I did some torture testing on sqlite and multiprocessing and found I could run MANY process workers, all connecting and adding to the same sqlite file without coordination and it was rock solid when I was just putting in test data.

So now I'm looking at your phrase "(fetching data from the web)" - perhaps you could try replacing that data fetching with some dummy data to ensure that it is really the sqlite3 connection causing you problems. At least in my tested case (running right now in another window) I found that multiple processes were able to all add through their own connection without issues but your description exactly matches the problem I'm having when two processes step on each other while going for the web API (very odd error actually) and sometimes don't get the expected data, which of course leaves an empty slot in the database. My eventual solution was to detect this failure within each worker and retry the web API call when it happened (could have been more elegant, but this was for a personal hack).

My apologies if this doesn't apply to your case, without code it's hard to know what you're facing, but the description makes me wonder if you might widen your considerations.

answered Oct 10 '22 19:10

Ezekiel Kruglick

Related questions
                            
                                Django query select distinct by field pairs
                            
                                Problem with 2D interpolation in SciPy, non-rectangular grid
                            
                                Dissecting a line of (obfuscated?) Python
                            
                                What is the difference between LIST.append(1) and LIST = LIST + [1] (Python)
                            
                                How to get 'Message-ID' using imaplib
                            
                                What is C#'s version of the GIL?
                            
                                Should I wait for Django to start supporting Python 3?
                            
                                Reading an entire binary file into Python
                            
                                Watching a property for changes
                            
                                How to run Ruby/Python scripts from inside PHP passing and receiving parameters?
                            
                                Get the list of a class's variables & methods in Python
                            
                                python join equivalent
                            
                                Pythonic way to merge two List of tuples into single list of dict
                            
                                Is there a pure Python library for parsing a Windows Registry file?
                            
                                How to log exceptions in appengine?
                            
                                Python: detect duplicates using a set
                            
                                Webpage redirect to the main page with CGI Python
                            
                                elegant way of using a range using an if statement?
                            
                                Python TimedRotatingFileHandler logs to a file and stderr
                            
                                Python zeromq -- Multiple Publishers To a Single Subscriber?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQLite3 and Multiprocessing

Tags:

python

sqlite

multiprocessing

dorvak

People also ask

3 Answers

omz

Radim

Ezekiel Kruglick

Recent Activity

Donate For Us