I work with comma/tab-separated data files often that might look like this:
key1,1,2.02,hello,4
key2,3,4.01,goodbye,6
...
I might read and pre-process this in Python into a list of lists, like this:
[ [ key1, 1, 2.02, 'hello', 4 ], [ key2, 3, 4.01, 'goodbye', 6 ] ]
Sometimes, I like saving this list of lists as a pickle, since it preserves the different types of my entries. If the pickled file is big, though, it would be great to read this list of lists back in a streaming fashion.
In Python, to load a text file as a stream, I use the follwoing to print out each line:
with open( 'big_text_file.txt' ) as f:
for line in f:
print line
Can I do something similar for a Python list, i.e.:
import pickle
with open( 'big_pickled_list.pkl' ) as p:
for entry in pickle.load_streaming( p ): # note: pickle.load_streaming doesn't exist
print entry
Is there a pickle function like "load_streaming"?
A tuple object is created and pickled using pickle. dump() function. print("The tuple is pickled successfully.") The output shows that the tuple object is pickled successfully.
Here are the things that the pickle module store: All the native datatypes that Python supports: booleans, integers, floating point numbers, complex numbers, strings, bytes objects, byte arrays, and None. Lists, tuples, dictionaries, and sets containing any combination of native datatypes.
Serialization and de-serialization with Pickle is a slower process when compared to other available alternatives. JSON is a lightweight format and is much faster than Pickling.
quickle is a fast and small serialization format for a subset of Python types. It's based off of Pickle, but includes several optimizations and extensions to provide improved performance and security. For supported types, serializing a message with quickle can be ~2-10x faster than using pickle .
This would work.
What is does however is unpickle one object from the file, and then print the rest of the file's content to stdout
What you could do is something like:
import cPickle
with open( 'big_pickled_list.pkl' ) as p:
try:
while True:
print cPickle.load(p)
except EOFError:
pass
That would unpickle all objects from the file until reaching EOF.
If you want something that works like for line in f:
, you can wrap this up easily:
def unpickle_iter(file):
try:
while True:
yield cPickle.load(file)
except EOFError:
raise StopIteration
Now you can just do this:
with open('big_pickled_list.pkl') as file:
for item in unpickle_iter(file):
# use item ...
To follow up on a comment I made on the accepted solution, I recommend a loop more like this:
import cPickle
with open( 'big_pickled_list.pkl' ) as p:
while p.peek(1):
print cPickle.load(p)
This way you'll continue to get the EOFError exception if there is a corrupted object in the file.
For completeness:
def unpickle_iter(file):
while file.peek(1):
yield cPickle.load(file)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With