The question may seem a little basic, but wasn't able to find anything that I understood in the internet. How do I store something that I pickled with dill? I have come this far for saving my construct (pandas DataFrame, which also contains custom classes): <pre class="prettyprint"><code>import dill dill_file = open("data/2017-02-10_21:43_resultstatsDF", "wb") dill_file.write(dill.dumps(resultstatsDF)) dill_file.close() </code></pre> and for reading <pre class="prettyprint"><code>dill_file = open("data/2017-02-10_21:43_resultstatsDF", "rb") resultstatsDF_out = dill.load(dill_file.read()) dill_file.close() </code></pre> but I when reading I get the error <pre class="prettyprint"><code>TypeError: file must have 'read' and 'readline' attributes </code></pre> How do I do this? <hr> EDIT for future readers: After having used this approach (to pickle my DataFrame) for while, now I refrain from doing so. As it turns out, different program versions (including objects that might be stored in the dill file) might result in not being able to recover the pickled file. Now I make sure that everything that I want to save, can be expressed as a string (as efficiently as possible) -- actually a human readable string. Now, I store my data as CSV. Objects in CSV-cells might be represented by JSON format. That way I make sure that my files will be readable in the months and years to come. Even if code changes, I am able to rewrite encoders by parsing the strings and I am able to understand the CSV my inspecting it manually.

Just give it the file without the <code>read</code>: <pre class="prettyprint"><code>resultstatsDF_out = dill.load(dill_file) </code></pre> you can also dill to file like this: <pre class="prettyprint"><code>with open("data/2017-02-10_21:43_resultstatsDF", "wb") as dill_file: dill.dump(resultstatsDF, dill_file) </code></pre> So: <pre class="prettyprint"><code>dill.dump(obj, open_file) </code></pre> writes to a file directly. Whereas: <pre class="prettyprint"><code>dill.dumps(obj) </code></pre> serializes <code>obj</code> and you can write it to file yourself. Likewise: <pre class="prettyprint"><code>dill.load(open_file) </code></pre> reads from a file, and: <pre class="prettyprint"><code>dill.loads(serialized_obj) </code></pre> constructs an object form a serialized object, which you could read from a file. It is recommended to open a file using the <code>with</code> statement. Here: <pre class="prettyprint"><code>with open(path) as fobj: # do somdthing with fobj </code></pre> has the same effect as: <pre class="prettyprint"><code>fobj = open(path) try: # do somdthing with fobj finally: fobj.close() </code></pre> The file will be closed as soon as you leave the indention of the <code>with</code> statement, even in the case of an exception.

How to dill (pickle) to file?

Tags:

python

dill

The question may seem a little basic, but wasn't able to find anything that I understood in the internet. How do I store something that I pickled with dill?

I have come this far for saving my construct (pandas DataFrame, which also contains custom classes):

import dill
dill_file = open("data/2017-02-10_21:43_resultstatsDF", "wb")
dill_file.write(dill.dumps(resultstatsDF))
dill_file.close()

and for reading

dill_file = open("data/2017-02-10_21:43_resultstatsDF", "rb")
resultstatsDF_out = dill.load(dill_file.read())
dill_file.close()

but I when reading I get the error

TypeError: file must have 'read' and 'readline' attributes

How do I do this?

EDIT for future readers: After having used this approach (to pickle my DataFrame) for while, now I refrain from doing so. As it turns out, different program versions (including objects that might be stored in the dill file) might result in not being able to recover the pickled file. Now I make sure that everything that I want to save, can be expressed as a string (as efficiently as possible) -- actually a human readable string. Now, I store my data as CSV. Objects in CSV-cells might be represented by JSON format. That way I make sure that my files will be readable in the months and years to come. Even if code changes, I am able to rewrite encoders by parsing the strings and I am able to understand the CSV my inspecting it manually.

840

asked Feb 10 '17 20:02

Make42

1 Answers

Just give it the file without the read:

resultstatsDF_out = dill.load(dill_file)

you can also dill to file like this:

with open("data/2017-02-10_21:43_resultstatsDF", "wb") as dill_file:
    dill.dump(resultstatsDF, dill_file)

So:

dill.dump(obj, open_file)

writes to a file directly. Whereas:

dill.dumps(obj)

serializes obj and you can write it to file yourself.

Likewise:

dill.load(open_file)

reads from a file, and:

dill.loads(serialized_obj)

constructs an object form a serialized object, which you could read from a file.

It is recommended to open a file using the with statement.

Here:

with open(path) as fobj:
    # do somdthing with fobj

has the same effect as:

fobj = open(path)
try:
    # do somdthing with fobj
finally:
    fobj.close()

The file will be closed as soon as you leave the indention of the with statement, even in the case of an exception.

answered Sep 25 '22 13:09

Mike Müller

Related questions
                            
                                find_package() errors during installing package via pip
                            
                                Python 2 - How would you round up/down to the nearest 6 minutes?
                            
                                Python using ZIP64 extensions when compressing large files
                            
                                Splitting columns of a numpy array easily
                            
                                Iterate over deque in python
                            
                                Using variables in the format() function in Python
                            
                                python pandas: how to find rows in one dataframe but not in another?
                            
                                How to run an function when anything changes in a dir with Python Watchdog?
                            
                                HTTPError: HTTP Error 503: Service Unavailable goslate language detection request : Python
                            
                                How to search for the last occurrence of a regular expression in a string in python?
                            
                                How can I dynamically render images from my images folder using Jinja and Flask?
                            
                                Viewing .npy images
                            
                                Using PythonService.exe to host python service while using virtualenv
                            
                                Python finding difference between two time stamps in minutes
                            
                                How to create a Manhattan plot with matplotlib in python?
                            
                                Fast Numpy Loops
                            
                                Is it possible to get the contents of an S3 file without downloading it using boto3?
                            
                                Unable to run a basic GraphFrames example
                            
                                Is there a keyboard shortcut in Pycharm for renaming a specific variable?
                            
                                How to embed matplotlib graph in Django webpage?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With