I'm running many (18k) batch jobs on a cluster with a Lustre filesystem. The jobs are submitted at the same time, each takes about 3 seconds, and they write the result using the sqlite3
python module. The write part of the code is very simple:
with sqlite3.connect(name, timeout=900) as conn:
conn.execute(
"insert into someTable values (?, ?)", (value1, value2))
but a lot of the jobs will throw an exception:
sqlite3.DatabaseError: database disk image is malformed
and sometimes
sqlite3.OperationalError: unable to open database file
I'm guessing this has something to do with lots of jobs putting a lock on the file when they write to it, but my impression was that sqlite3
should know to wait patiently for the file to be free. Is my error likely a result of too many concurrent writes? How can I fix it?
SQLite does not support storage on a distributed file system. Concurrent access required locking and that is not transferred across such a system.
You'll have to move to a database that supports a networked model instead, such as MySQL or PostgreSQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With