I'm running a Python program which uses the shelve module on top of pickle. After running this program sometimes I get one output file as a.data but at other times I get three output files as a.data.bak, a.data.dir and a.data.dat. 
Why is that?
The shelve module implements persistent storage for arbitrary Python objects which can be pickled, using a dictionary-like API. The shelve module can be used as a simple persistent storage option for Python objects when a relational database is overkill. The shelf is accessed by keys, just as with a dictionary.
The shelve module in Python's standard library is a simple yet effective tool for persistent data storage when using a relational database solution is not required. The shelf object defined in this module is dictionary-like object which is persistently stored in a disk file.
There is quite some indirection here. Follow me carefully.
The shelve module is implemented on top of the dbm module. This module acts as a facade for 3(* different specific DBM implementations, and it will pick the first module available when creating a new database, in the following order:
dbm.gnu, Python module for the GNU DBM library; you would use it directly if you needed the extra functionality it offers over the base dbm module (it lets you iterate over the keys in stored order and 'pack' the database to free up space from deleted objects).dbm.ndbm, a proxy module using either the ndbm, BSD DB and GNU DBM libraries (choosen when Python is compiled).dbm.dumb, a pure-python implementation.It is this range of choices that makes shelve files appear to grow extra extensions on different platforms.
The dbm.dumb module is the one that adds the .bak, .dat and .dir extensions:
Open a dumbdbm database and return a dumbdbm object. The filename argument is the basename of the database file (without any specific extensions). When a dumbdbm database is created, files with
.datand.dirextensions are created.
The .dir file is moved to .bak as new index dicts are committed for changes made to the data structures (when adding a new key, deleting a key, or by calling .sync() or .close()).
It means that the other three options for anydbm are not available on your platform.
The other formats may give you other extensions. The dbm module may use .dir, .pag or .db, depending on what library was used for that module.
(* Python 2 had four dbm modules, it would default to the deprecated dbhash module, which in turn was built on top of the bsddb module. These were both removed from Python 3.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With