I have a python script that is working with a large (~14gb) textfile. I end up with a dictionary of keys and values, but I am getting a memory error when I try to sort the dictionary by value.
I know the dictionary is too big to load into memory and then sort, but how could I go about accomplishing this?
You can use an ordered key/value store like wiredtiger, leveldb, bsddb. All of them support ordered keys using custom sort function. leveldb is the easiest to use but if you use python 2.7, bsddb
is included in the stdlib. If you only require lexicographic sorting you can use the raw hashopen
function to open a persistent sorted dictionary:
from bsddb import hashopen
db = hashopen('dict.db')
db['020'] = 'twenty'
db['002'] = 'two'
db['value'] = 'value'
db['key'] = 'key'
print(db.keys())
This outputs
>>> ['002', '020', 'key', 'value']
Don't forget to close the db after your work:
db.close()
Mind the fact that hashopen configuration might not suit your need. In this case I recommend you use leveldb which has a simple API or wiredtiger for speed.
To order by value in bsddb, you have to use the composite key pattern or key composition. Which boils down to create a dictionary key which keeps the ordering you look for. In this example we pack the original dict value first (so that small values appears first) with the original dict key (so that the bsddb key is unique):
import struct
from bsddb import hashopen
my_dict = {'a': 500, 'abc': 100, 'foobar': 1}
# insert
db = hashopen('dict.db')
for key, value in my_dict.iteritems():
composite_key = struct.pack('>Q', value) + key
db[composite_key] = '' # value is not useful in this case but required
db.close()
# read
db = hashopen('dict.db')
for key, _ in db.iteritems(): # iterate over database
size = struct.calcsize('>Q')
# unpack
value, key = key[:size], key[size:]
value = struct.unpack('>Q', value)[0]
print key, value
db.close()
This outputs the following:
foobar 1
abc 100
a 500
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With