Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the shelve module in python sometimes create files with different extensions?

I'm running a Python program which uses the shelve module on top of pickle. After running this program sometimes I get one output file as a.data but at other times I get three output files as a.data.bak, a.data.dir and a.data.dat.

Why is that?

like image 859
Ali_IT Avatar asked Apr 23 '13 14:04

Ali_IT


People also ask

How does shelve work Python?

The shelve module implements persistent storage for arbitrary Python objects which can be pickled, using a dictionary-like API. The shelve module can be used as a simple persistent storage option for Python objects when a relational database is overkill. The shelf is accessed by keys, just as with a dictionary.

What is shelve module in Python?

The shelve module in Python's standard library is a simple yet effective tool for persistent data storage when using a relational database solution is not required. The shelf object defined in this module is dictionary-like object which is persistently stored in a disk file.


1 Answers

There is quite some indirection here. Follow me carefully.

The shelve module is implemented on top of the dbm module. This module acts as a facade for 3(* different specific DBM implementations, and it will pick the first module available when creating a new database, in the following order:

  1. dbm.gnu, Python module for the GNU DBM library; you would use it directly if you needed the extra functionality it offers over the base dbm module (it lets you iterate over the keys in stored order and 'pack' the database to free up space from deleted objects).
  2. dbm.ndbm, a proxy module using either the ndbm, BSD DB and GNU DBM libraries (choosen when Python is compiled).
  3. dbm.dumb, a pure-python implementation.

It is this range of choices that makes shelve files appear to grow extra extensions on different platforms.

The dbm.dumb module is the one that adds the .bak, .dat and .dir extensions:

Open a dumbdbm database and return a dumbdbm object. The filename argument is the basename of the database file (without any specific extensions). When a dumbdbm database is created, files with .dat and .dir extensions are created.

The .dir file is moved to .bak as new index dicts are committed for changes made to the data structures (when adding a new key, deleting a key, or by calling .sync() or .close()).

It means that the other three options for anydbm are not available on your platform.

The other formats may give you other extensions. The dbm module may use .dir, .pag or .db, depending on what library was used for that module.


(* Python 2 had four dbm modules, it would default to the deprecated dbhash module, which in turn was built on top of the bsddb module. These were both removed from Python 3.

like image 77
Martijn Pieters Avatar answered Oct 11 '22 14:10

Martijn Pieters