I have several script running on a server which pickle and unpickle various dictionaries. They all use the same basic code for pickling as shown below:
SellerDict=open('/home/hostadl/SellerDictkm','rb')
SellerDictionarykm=pickle.load(SellerDict)
SellerDict.close()
SellerDict=open('/home/hostadl/SellerDictkm','wb')
pickle.dump(SellerDictionarykm,SellerDict)
SellerDict.close()
All the scripts run fine except for one of them. The one that has issues goes to various websites and scrapes data and stores it in dictionary. This code runs all day long pickling and unpickling dictionaries and stops at midnight. A cronjob then starts it again the next morning. This script can run for weeks without having a problem but about once a month the script dies due to a EOFError when it tries to open a dictionary. The size of the dictionaries are usually about 80 MB. I even tried adding SellerDict.flush() before SellerDict.close() when pickling the data to make sure evening was being flushed.
Any idea's what could be causing this? Python is pretty solid so I don't think it is due to the size of the file. Where the code runs fine for a long time before dying it leads me to believe that maybe something is being saved in the dictionary that is causing this issue but I have no idea.
Also, if you know of a better way to be saving dictionaries other than pickle I am open to options. Like I said earlier, the dictionaries are constantly being opened and closed. Just for clarification, only one program will use the same dictionary so the issue is not being caused by several programs trying to access the same dictionary.
UPDATE:
Here is the traceback that I have from a log file.
Traceback (most recent call last):
File "/home/hostadl/CompileRecentPosts.py", line 782, in <module>
main()
File "/home/hostadl/CompileRecentPosts.py", line 585, in main
SellerDictionarykm=pickle.load(SellerDict)
EOFError
So this actually turned out to be a memory issue. When the computer would run out of RAM and try to unpickle or load the data, the process would fail claiming this EOFError. I increase the RAM on the computer and this never was an issue again.
Thanks for all the comments and help.
Here's what happens when you don't use locking:
import pickle
# define initial dict
orig_dict={'foo':'one'}
# write dict to file
writedict_file=open('./mydict','wb')
pickle.dump(orig_dict,writedict_file)
writedict_file.close()
# read the dict from file
readdict_file=open('./mydict','rb')
mydict=pickle.load(readdict_file)
readdict_file.close()
# now we have new data to save
new_dict={'foo':'one','bar':'two'}
writedict_file=open('./mydict','wb')
#pickle.dump(orig_dict,writedict_file)
#writedict_file.close()
# but...whoops! before we could save the data
# some other reader tried opening the file
# now they are having a problem
readdict_file=open('./mydict','rb')
mydict=pickle.load(readdict_file) # errors out here
readdict_file.close()
Here's the output:
python pickletest.py
Traceback (most recent call last):
File "pickletest.py", line 26, in <module>
mydict=pickle.load(readdict_file) # errors out here
File "/usr/lib/python2.6/pickle.py", line 1370, in load
return Unpickler(file).load()
File "/usr/lib/python2.6/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib/python2.6/pickle.py", line 880, in load_eof
raise EOFError
EOFError
Eventually, some read process is going to try to read the pickled file while a write process already has it open to write. You need to make sure that you have some way to tell whether another process already has a file open for writing before you try to read from it.
For a very simple solution, have a look at this thread that discusses using Filelock.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With