Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compress large file in ruby with Zlib for gzip

Tags:

ruby

gzip

zlib

I have a very large file, approx. 200 million rows of data.

I would like to compress it with the Zlib library, specifically using the Writer.

Reading through each line one at at time seems like it would take quite a bit of time. Is there a better way to accomplish this?

Here is what I have right now:

require 'zlib'

Zlib::GzipWriter.open('compressed_file.gz') do |gz|
 File.open(large_data_file).each do |line|
   gz.write line
 end
 gz.close
end
like image 522
Jackson Avatar asked Jun 30 '14 18:06

Jackson


People also ask

Can zlib decompress GZIP?

This package provides a pure interface for compressing and decompressing streams of data represented as lazy ByteString s. It uses the zlib C library so it has high performance. It supports the zlib , gzip and raw compression formats.

What is zlib Ruby?

Zlib is designed to be a portable, free, general-purpose, legally unencumbered – that is, not covered by any patents – lossless data-compression library for use on virtually any computer hardware and operating system.


1 Answers

You can use IO#read to read a chunk of arbitrary length from the file.

require 'zlib'

Zlib::GzipWriter.open('compressed_file.gz') do |gz|
 File.open(large_data_file) do |fp|
   while chunk = fp.read(16 * 1024) do
     gz.write chunk
   end
 end
 gz.close
end

This will read the source file in 16kb chunks and add each compressed chunk to the output stream. Adjust the block size to your preference based on your environment.

like image 171
Chris Heald Avatar answered Oct 24 '22 00:10

Chris Heald