I generated by pickle.dump() a file with the size of about 5GB. It takes about half a day to load this file and about 50GM RAM. My question is whether is it possible to read this file by accessing separately entry by entry (one at a time) rather than loading it all into memory, or if you have any other suggestion of how to access data in such a file.
Many thanks.
If you just want to work with a larger dictionary than memory can hold, the shelve module is a good quick-and-dirty solution. It acts like an in-memory dict, but stores itself on disk rather than in memory. shelve is based on cPickle, so be sure to set your protocol to anything other than 0.
Python Pickle load To retrieve pickled data, the steps are quite simple. You have to use pickle. load() function to do that. The primary argument of pickle load function is the file object that you get by opening the file in read-binary (rb) mode.
There is in principle no size limitation to a dictionary in Python, except the capacity of your available memory (RAM + Swap space).
Speed: Pickle is slow, JSON is fast, because of the serialization method. Security: Pickle is not secure, JSON is. Only deserialize pickled data that you trust; being binary code, it can trigger function calls that may be malicious.
There is absolutely no question that this should be done using a database, rather than pickle- databases are designed for exactly this kind of problem.
Here is some code to get you started, which puts a dictionary into a sqllite database and shows an example of retrieving a value. To get this to work with your actual dictionary rather than my toy example, you'll need to learn more about SQL, but fortunately there are many excellent resources available online. In particular, you might want to learn how to use SQLAlchemy, which is an "Object Relational Mapper" that can make working with databases as intuitive as working with objects.
import os
import sqlite3
# an enormous dictionary too big to be stored in pickle
my_huge_dictionary = {"A": 1, "B": 2, "C": 3, "D": 4}
# create a database in the file my.db
conn = sqlite3.connect('my.db')
c = conn.cursor()
# Create table with two columns: k and v (for key and value). Here your key
# is assumed to be a string of length 10 or less, and your value is assumed
# to be an integer. I'm sure this is NOT the structure of your dictionary;
# you'll have to read into SQL data types
c.execute("""
create table dictionary (
k char[10] NOT NULL,
v integer NOT NULL,
PRIMARY KEY (k))
""")
# dump your enormous dictionary into a database. This will take a while for
# your large dictionary, but you should do it only once, and then in the future
# make changes to your database rather than to a pickled file.
for k, v in my_huge_dictionary.items():
c.execute("insert into dictionary VALUES ('%s', %d)" % (k, v))
# retrieve a value from the database
my_key = "A"
c.execute("select v from dictionary where k == '%s'" % my_key)
my_value = c.next()[0]
print my_value
Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With