Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Pickle handle files larger than the RAM installed on my machine?

I'm using pickle for saving on disk my NLP classifier built with the TextBlob library.

I'm using pickle after a lot of searches related to this question. At the moment I'm working locally and I have no problem loading the pickle file (which is 1.5Gb) with my i7 and 16gb RAM machine. But the idea is that my program, in the future, has to run on my server which only has 512Mb RAM installed.

Can pickle handle such a large file or will I face memory issues?

On my server I've got Python 3.5 installed and it is a Linux server (not sure which distribution).

I'm asking because at the moment I can't access my server, so I can't just try and find out what happens, but at the same time I'm doubtful if I can keep this approach or I have to find other solutions.

like image 857
Nico Avatar asked Nov 27 '15 21:11

Nico


People also ask

Does pickling reduce file size?

pickle file and the . csv files took up about the same space, around 40 MB, but the compressed pickle file took up only 1.5 MB. That's a lot of saved space. Another big difference is in the load times.

Is pickle more efficient than CSV?

Stop Using CSVs for Storage — Pickle is an 80 Times Faster Alternative.

What are the advantages of a pickle file?

The advantage of using pickle is that it can serialize pretty much any Python object, without having to add any extra code. Its also smart in that in will only write out any single object once, making it effective to store recursive structures like graphs.

How do I read a large pickle in Python?

The process of loading a pickled file back into a Python program is similar to the one you saw previously: use the open() function again, but this time with 'rb' as second argument (instead of wb ). The r stands for read mode and the b stands for binary mode. You'll be reading a binary file. Assign this to infile .


1 Answers

Unfortunately this is difficult to accurately answer without testing it on your machine.

Here are some initial thoughts:

  1. There is no inherent size limit that the Pickle module enforces, but you're pushing the boundaries of its intended use. It's not designed for individual large objects. However, you since you're using Python 3.5, you will be able to take advantage of PEP 3154 which adds better support for large objects. You should specify pickle.HIGHEST_PROTOCOL when you dump your data.

  2. You will likely have a large performance hit because you're trying to deal with an object that is 3x the size of your memory. Your system will probably start swapping, and possibly even thrashing. RAM is so cheap these days, bumping it up to at least 2GB should help significantly.

  3. To handle the swapping, make sure you have enough swap space available (a large swap partition if you're on Linux, or enough space for the swap file on your primary partition on Windows).

  4. As pal sch's comment shows, Pickle is not very friendly to RAM consumption during the pickling process, so you may have to deal with Python trying to get even more memory from the OS than the 1.5GB we may expect for your object.

Given these considerations, I don't expect it to work out very well for you. I'd strongly suggest upgrading the RAM on your target machine to make this work.

like image 169
skrrgwasme Avatar answered Oct 24 '22 09:10

skrrgwasme