Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python shelve having items that aren't listed

I've been saving a bunch of dictionaries to file using Python's shelve module (with Python 3.4 on OSX 10.9.5). Each key is a string of an int (e.g., "84554"), and each value is a dictionary of dictionaries of a few smallish strings.

No keys are used twice, and I know the total superset of all possible keys. I am adding these key-value pairs to the shelf via threads and which keys/values are added changes each time I run it (which is expected).

The problem I've been having is that the number of keys iterable/visible with shelve's shelf.keys() and the number of unique keys for which key in shelf.keys() are different.

Here's my code. I first initialize things and load ids, which is the list of all possible keys.

import shelve 
from custom_code import *
MAIN_PATH = "/Users/myname/project_path/src/"
ids = list(set(load_list(MAIN_PATH + "id_list.pkl")))
c = c2 = 0
good_keys = []
bad_keys = []

I then open the shelf, counting all the number of keys that I iterate through with db.keys(), adding the "good" keys to a list.

db = shelve.open(MAIN_PATH + "first_3")
for k in db.keys():
    c2+=1
    good_keys+=[k]

Then, I check each possible key to see if it's in the shelf, checking to see if it exists in the shelf, and doing the same thing as above.

for j in set(ids):
    if j in db.keys():
        c+=1
        bad_keys+=[j]

The two counters, c and c2, should be the same, but doing:

print("With `db.keys()`: {0}, with verifying from the list: {1}".format(c2, c))    

yields:

With `db.keys()`: 628, with verifying from the list: 669

I then look at keys that were in bad_keys but not good_keys (i.e., collected from db.keys()) and pick an example.

odd_men_out = list( set(bad_keys).difference( set(good_keys) ) )
bad_key = odd_men_out[0] 
print(bad_key) # '84554'

I then check the following:

print(bad_key in db.keys()) # True
print(bad_key in db)  # True
print(db[bad_key]) # A dictionary of dictionaries that wraps ~12ish lines
print(bad_key in list(db.keys())) # False

Note that last check. Does anybody know what gives? I thought shelves was supposed to be easy, but it's been giving me complete hell.

Perhaps unrelatedly (but perhaps not), when I let an even greater number of entries accumulate in the shelf and try to do something like for k in db.keys() or list(db.keys()), I get the following error:

  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/_collections_abc.py", line 482, in __iter__
    yield from self._mapping
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/shelve.py", line 95, in __iter__
    for k in self.dict.keys():
SystemError: Negative size passed to PyBytes_FromStringAndSize

But can still access the data by trying all possible keys. Evidently that's because I'm not using gdbm?

like image 300
Zeke Avatar asked Nov 08 '22 08:11

Zeke


1 Answers

When I tried to save some numpy arrays with more than 1000 elements in my shelf it would only save some and completely skip others, without generating an error.

Apparently this is an issue when using Shelve in Mac OSX (here are some bug reports (https://bugs.python.org/issue33074 , https://bugs.python.org/issue30388).

The only easy solution I found was to use Pickle instead of Shelve.

like image 196
Gralhos Avatar answered Nov 15 '22 10:11

Gralhos