I have a very large file, approx. 200 million rows of data.
I would like to compress it with the Zlib library, specifically using the Writer.
Reading through each line one at at time seems like it would take quite a bit of time. Is there a better way to accomplish this?
Here is what I have right now:
require 'zlib'
Zlib::GzipWriter.open('compressed_file.gz') do |gz|
File.open(large_data_file).each do |line|
gz.write line
end
gz.close
end
This package provides a pure interface for compressing and decompressing streams of data represented as lazy ByteString s. It uses the zlib C library so it has high performance. It supports the zlib , gzip and raw compression formats.
Zlib is designed to be a portable, free, general-purpose, legally unencumbered – that is, not covered by any patents – lossless data-compression library for use on virtually any computer hardware and operating system.
You can use IO#read to read a chunk of arbitrary length from the file.
require 'zlib'
Zlib::GzipWriter.open('compressed_file.gz') do |gz|
File.open(large_data_file) do |fp|
while chunk = fp.read(16 * 1024) do
gz.write chunk
end
end
gz.close
end
This will read the source file in 16kb chunks and add each compressed chunk to the output stream. Adjust the block size to your preference based on your environment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With