Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting SQLite3 to work with multiple threads

I'm making a web crawler in Python that collects redirects/links, adds them to a database, and enters them as a new row if the link doesn't already exist. I want like to use multi-threading but having trouble because I have to check in real time if there is an entry with a given URL.

I was initially using sqlite3 but realised I can't use it simultaneously on different threads. I don't really want to use MySQL (or something similar) as it needs more disk space and runs as separate server. Is there anyway to make sqlite3 work with multiple threads?

like image 284
Val G. Avatar asked Sep 19 '25 20:09

Val G.


1 Answers

The Python sqlite3 module has a threadsafety level of 1, which means that although you can't share database connections between threads, multiple threads can use the module simultaneously. So, you could have each thread create its own connection to the database.

The problem with this approach is that SQLite has poor write concurrency, so having multiple threads doing a ton of INSERTs at once will give you the dreaded “database is locked” error. You can improve things somewhat by using PRAGMA JOURNAL_MODE = 'WAL', but that only goes so far.

If performance is an issue and switching to a client-server database isn't an option, then what you'll probably have to do is keep an in-memory cache of your URLs, and arrange your program so that you have one thread that syncs this cache with the SQLite database.

like image 52
dan04 Avatar answered Sep 22 '25 08:09

dan04