I have a script in Python to compress big string:
import zlib
def processFiles():
...
s = """Large string more than 2Gb"""
data = zlib.compress(s)
...
When I run this script, I got a error:
ERROR: Traceback (most recent call last):#012 File "./../commands/sce.py", line 438, in processFiles#012 data = zlib.compress(s)#012OverflowError: size does not fit in an int
Some information:
zlib.version = '1.0'
zlib.ZLIB_VERSION = '1.2.7'
# python -V
Python 2.7.3
# uname -a
Linux app2 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux
# free
total used free shared buffers cached
Mem: 65997404 8096588 57900816 0 184260 7212252
-/+ buffers/cache: 700076 65297328
Swap: 35562236 0 35562236
# ldconfig -p | grep python
libpython2.7.so.1.0 (libc6,x86-64) => /usr/lib/libpython2.7.so.1.0
libpython2.7.so (libc6,x86-64) => /usr/lib/libpython2.7.so
How to compress big data (more than 2Gb) in Python?
To create your own compressed ZIP files, you must open the ZipFile object in write mode by passing 'w' as the second argument. When you pass a path to the write() method of a ZipFile object, Python will compress the file at that path and add it into the ZIP file.
compress(text) should be compressed = zlib. compress(text. encode()) . It seems that this is most effective with longer strings.
After saving the model into HDF5, you need to load the model and save the weights of it. By this H5 or HDF5 file size will be reduced.
Standard python pickle, thinly wrapped with standard compression libraries. The standard pickle package provides an excellent default tool for serializing arbitrary python objects and storing them to disk. Standard python also includes broad set of data compression packages.
My function to compress large data:
def compressData(self, s):
compressed = ''
begin = 0
blockSize = 1073741824 # 1Gb
compressor = zlib.compressobj()
while begin < len(s):
compressed = compressed + compressor.compress(s[begin:begin + blockSize])
begin = begin + blockSize
compressed = compressed + compressor.flush()
return compressed
This is not a RAM issue. Simply either zlib or the python binding cannot handle data larger than 4GB.
Split your data into 4GB (or smaller chunks) and process each one separately.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With