I have a script in Python to compress big string: <pre class="prettyprint"><code>import zlib def processFiles(): ... s = """Large string more than 2Gb""" data = zlib.compress(s) ... </code></pre> When I run this script, I got a error: <pre class="prettyprint"><code>ERROR: Traceback (most recent call last):#012 File "./../commands/sce.py", line 438, in processFiles#012 data = zlib.compress(s)#012OverflowError: size does not fit in an int </code></pre> Some information: zlib.version = '1.0' zlib.ZLIB_VERSION = '1.2.7' <pre class="prettyprint"><code># python -V Python 2.7.3 # uname -a Linux app2 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux # free total used free shared buffers cached Mem: 65997404 8096588 57900816 0 184260 7212252 -/+ buffers/cache: 700076 65297328 Swap: 35562236 0 35562236 # ldconfig -p | grep python libpython2.7.so.1.0 (libc6,x86-64) => /usr/lib/libpython2.7.so.1.0 libpython2.7.so (libc6,x86-64) => /usr/lib/libpython2.7.so </code></pre> How to compress big data (more than 2Gb) in Python?

My function to compress large data: <pre class="prettyprint"><code>def compressData(self, s): compressed = '' begin = 0 blockSize = 1073741824 # 1Gb compressor = zlib.compressobj() while begin < len(s): compressed = compressed + compressor.compress(s[begin:begin + blockSize]) begin = begin + blockSize compressed = compressed + compressor.flush() return compressed </code></pre>

This is not a RAM issue. Simply either zlib or the python binding cannot handle data larger than 4GB. Split your data into 4GB (or smaller chunks) and process each one separately.

Trouble with compressing big data in python

Tags:

python

zlib

I have a script in Python to compress big string:

import zlib

def processFiles():
  ...
  s = """Large string more than 2Gb"""
  data = zlib.compress(s)
  ...

When I run this script, I got a error:

ERROR: Traceback (most recent call last):#012  File "./../commands/sce.py", line 438, in processFiles#012    data = zlib.compress(s)#012OverflowError: size does not fit in an int

Some information:

zlib.version = '1.0'

zlib.ZLIB_VERSION = '1.2.7'

# python -V
Python 2.7.3

# uname -a
Linux app2 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux

# free
             total       used       free     shared    buffers     cached
Mem:      65997404    8096588   57900816          0     184260    7212252
-/+ buffers/cache:     700076   65297328
Swap:     35562236          0   35562236

# ldconfig -p | grep python
libpython2.7.so.1.0 (libc6,x86-64) => /usr/lib/libpython2.7.so.1.0
libpython2.7.so (libc6,x86-64) => /usr/lib/libpython2.7.so

How to compress big data (more than 2Gb) in Python?

202

asked May 27 '14 05:05

Dmitry Skryabin

2 Answers

My function to compress large data:

def compressData(self, s):
    compressed = ''
    begin = 0
    blockSize = 1073741824 # 1Gb
    compressor = zlib.compressobj()
    while begin < len(s):
      compressed = compressed + compressor.compress(s[begin:begin + blockSize])
      begin = begin + blockSize
    compressed = compressed + compressor.flush()
    return compressed

answered Oct 04 '22 22:10

Dmitry Skryabin

This is not a RAM issue. Simply either zlib or the python binding cannot handle data larger than 4GB.

Split your data into 4GB (or smaller chunks) and process each one separately.

answered Oct 04 '22 22:10

JBernardo

Related questions
                            
                                Setting the absolute position of a figure using python matplotlib with the MacOSX backend
                            
                                Django: Cannot resolve keyword '' into field. Choices are:
                            
                                Django get_or_create and ManytoManyField
                            
                                Upright mu in plot label: retaining original tick fonts
                            
                                Recursive generator in C++
                            
                                How to detect curses ALT + key combinations in python
                            
                                how to get spyder's python recognize external packages on MacOS X?
                            
                                Using tweepy to stream users' timeline and filtered tweets
                            
                                How to migrate Django project to Pythonanywhere
                            
                                matplotlib's zoom functionality inside a tkinter canvas
                            
                                Why do list operations in python operate outside of the function scope? [duplicate]
                            
                                Error: No module named cv2
                            
                                S3 redirect 302 object with s3cmd
                            
                                OpenCV - QueryFrame() returns older image from the webcam
                            
                                Google OAuth2 redirect_uri_mismatch Issue
                            
                                Matplotlib - set pad between arrow and text in annotate function
                            
                                High-speed alternatives to replace byte array processing bottlenecks
                            
                                Recognising objects in images using HAAR cascade and OpenCV
                            
                                PyAPNs sending push notification to more than one device token not working
                            
                                Literal parenthesis with python regex

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Trouble with compressing big data in python

Tags:

python

zlib

Dmitry Skryabin

People also ask

2 Answers

Dmitry Skryabin

JBernardo

Recent Activity

Donate For Us