I've been saving a bunch of dictionaries to file using Python's shelve
module (with Python 3.4 on OSX 10.9.5). Each key
is a string of an int (e.g., "84554"
), and each value
is a dictionary of dictionaries of a few smallish strings.
No keys are used twice, and I know the total superset of all possible keys. I am adding these key-value pairs to the shelf
via threads and which keys/values are added changes each time I run it (which is expected).
The problem I've been having is that the number of keys iterable/visible with shelve
's shelf.keys()
and the number of unique keys for which key in shelf.keys()
are different.
Here's my code. I first initialize things and load ids
, which is the list of all possible keys.
import shelve
from custom_code import *
MAIN_PATH = "/Users/myname/project_path/src/"
ids = list(set(load_list(MAIN_PATH + "id_list.pkl")))
c = c2 = 0
good_keys = []
bad_keys = []
I then open the shelf, counting all the number of keys that I iterate through with db.keys()
, adding the "good" keys to a list.
db = shelve.open(MAIN_PATH + "first_3")
for k in db.keys():
c2+=1
good_keys+=[k]
Then, I check each possible key to see if it's in the shelf, checking to see if it exists in the shelf, and doing the same thing as above.
for j in set(ids):
if j in db.keys():
c+=1
bad_keys+=[j]
The two counters, c
and c2
, should be the same, but doing:
print("With `db.keys()`: {0}, with verifying from the list: {1}".format(c2, c))
yields:
With `db.keys()`: 628, with verifying from the list: 669
I then look at keys that were in bad_keys
but not good_keys
(i.e., collected from db.keys()
) and pick an example.
odd_men_out = list( set(bad_keys).difference( set(good_keys) ) )
bad_key = odd_men_out[0]
print(bad_key) # '84554'
I then check the following:
print(bad_key in db.keys()) # True
print(bad_key in db) # True
print(db[bad_key]) # A dictionary of dictionaries that wraps ~12ish lines
print(bad_key in list(db.keys())) # False
Note that last check. Does anybody know what gives? I thought shelves
was supposed to be easy, but it's been giving me complete hell.
Perhaps unrelatedly (but perhaps not), when I let an even greater number of entries accumulate in the shelf and try to do something like for k in db.keys()
or list(db.keys())
, I get the following error:
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/_collections_abc.py", line 482, in __iter__
yield from self._mapping
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/shelve.py", line 95, in __iter__
for k in self.dict.keys():
SystemError: Negative size passed to PyBytes_FromStringAndSize
But can still access the data by trying all possible keys. Evidently that's because I'm not using gdbm
?
When I tried to save some numpy arrays with more than 1000 elements in my shelf it would only save some and completely skip others, without generating an error.
Apparently this is an issue when using Shelve in Mac OSX (here are some bug reports (https://bugs.python.org/issue33074 , https://bugs.python.org/issue30388).
The only easy solution I found was to use Pickle instead of Shelve.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With