I have a program where I basically adjust the probability of certain things happening based on what is already known. My file of data is already saved as a pickle
Dictionary object at Dictionary.txt
.
The problem is that everytime that I run the program it pulls in the Dictionary.txt
, turns it into a dictionary object, makes it's edits and overwrites Dictionary.txt
. This is pretty memory intensive as the Dictionary.txt
is 123 MB. When I dump I am getting the MemoryError, everything seems fine when I pull it in..
Is there a better (more efficient) way of doing the edits? (Perhaps w/o having to overwrite the entire file everytime)
Is there a way that I can invoke garbage collection (through gc
module)? (I already have it auto-enabled via gc.enable()
)
I know that besides readlines()
you can read line-by-line. Is there a way to edit the dictionary incrementally line-by-line when I already have a fully completed Dictionary object File in the program.
Any other solutions?
Thank you for your time.
However, Pickle doesn't support appending, so you'll have to save your data to a new file (come up with a different file name -- ask the user or use a command-line parameter such as -o test. txt ?) each time the program is run. On a related topic, don't use Pickle.
First, import pickle to use it, then we define an example dictionary, which is a Python object. Next, we open a file (note that we open to write bytes in Python 3+), then we use pickle. dump() to put the dict into opened file, then close. Use pickle.
By default, the pickle data format uses a relatively compact binary representation. If you need optimal size characteristics, you can efficiently compress pickled data.
'wb' means 'write binary' and is used for the file handle: open('save. p', 'wb' ) which writes the pickeled data into a file.
I was having the same issue. I use joblib and work was done. In case if someone wants to know other possibilities.
save the model to disk
from sklearn.externals import joblib filename = 'finalized_model.sav' joblib.dump(model, filename)
some time later... load the model from disk
loaded_model = joblib.load(filename) result = loaded_model.score(X_test, Y_test) print(result)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With