Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pickle dump huge file without memory error

I have a program where I basically adjust the probability of certain things happening based on what is already known. My file of data is already saved as a pickle Dictionary object at Dictionary.txt.

The problem is that everytime that I run the program it pulls in the Dictionary.txt, turns it into a dictionary object, makes it's edits and overwrites Dictionary.txt. This is pretty memory intensive as the Dictionary.txt is 123 MB. When I dump I am getting the MemoryError, everything seems fine when I pull it in..

  • Is there a better (more efficient) way of doing the edits? (Perhaps w/o having to overwrite the entire file everytime)

  • Is there a way that I can invoke garbage collection (through gc module)? (I already have it auto-enabled via gc.enable())

  • I know that besides readlines() you can read line-by-line. Is there a way to edit the dictionary incrementally line-by-line when I already have a fully completed Dictionary object File in the program.

  • Any other solutions?

Thank you for your time.

like image 662
user2543682 Avatar asked Jul 07 '13 14:07

user2543682


People also ask

Does pickle dump overwrite or append?

However, Pickle doesn't support appending, so you'll have to save your data to a new file (come up with a different file name -- ask the user or use a command-line parameter such as -o test. txt ?) each time the program is run. On a related topic, don't use Pickle.

How do I dump files in pickle?

First, import pickle to use it, then we define an example dictionary, which is a Python object. Next, we open a file (note that we open to write bytes in Python 3+), then we use pickle. dump() to put the dict into opened file, then close. Use pickle.

Can you compress a pickle file?

By default, the pickle data format uses a relatively compact binary representation. If you need optimal size characteristics, you can efficiently compress pickled data.

What does WB mean in pickle dump?

'wb' means 'write binary' and is used for the file handle: open('save. p', 'wb' ) which writes the pickeled data into a file.


1 Answers

I was having the same issue. I use joblib and work was done. In case if someone wants to know other possibilities.

save the model to disk

from sklearn.externals import joblib filename = 'finalized_model.sav' joblib.dump(model, filename)   

some time later... load the model from disk

loaded_model = joblib.load(filename) result = loaded_model.score(X_test, Y_test)   print(result) 
like image 186
Ch HaXam Avatar answered Sep 21 '22 01:09

Ch HaXam