After reviewing SO post Ruby: Split binary data, I used the following code which works.
z = 'A' * 1_000_000
z.bytes.each_slice( STREAMING_CHUNK_SIZE ).each do | chunk |
c = chunk.pack( 'C*' )
end
However, it is very slow:
Benchmark.realtime do
...
=> 0.0983949700021185
98ms to slice and pack a 1MB file. This is very slow.
Use Case:
Server receives binary data from an external API, and streams it using socket.write chunk.pack( 'C*' )
.
The data is expected to be between 50KB and 5MB, with an average of 500KB.
So, how to efficiently slice binary data in Ruby?
Your code looks nice, uses the correct Ruby methods and the correct syntax, but it still :
The following code extracts the parts directly from the string, without converting anything :
def get_binary_chunks(string, size)
Array.new(((string.length + size - 1) / size)) { |i| string.byteslice(i * size, size) }
end
(string.length + size - 1) / size)
is just to avoid missing the last chunk if it is smaller than size
.
With a 500kB pdf file and chunks of 12345 bytes, Fruity returns :
Running each test 16 times. Test will take about 28 seconds.
_eric_duminil is faster than _b_seven by 380x ± 100.0
get_binary_chunks
is also 6x times faster than StringIO#each(n)
with this example.
If you're sure the string is binary (not UTF8 with multibyte characters like 'ä'), you can use slice
instead of byteslice
:
def get_binary_chunks(string, size)
Array.new(((string.length + size - 1) / size)) { |i| string.slice(i * size, size) }
end
which makes the code even faster (about 500x compared to your method).
If you use this code with a Unicode String, the chunks will have size
characters but might have more than size
bytes.
Finally, if you're not interested in getting an Array of Strings, you could use the chunks directly :
def send_binary_chunks(socket, string, size)
((string.length + size - 1) / size).times do |i|
socket.write string.slice(i * size, size)
end
end
Use StringIO#each(n)
with a string that has BINARY
encoding:
require 'stringio'
string.force_encoding(Encoding::BINARY)
StringIO.new(string).each(size) { |chunk| socket.write(chunk) }
This only allocates the intermediate arrays just before pushing them to the socket.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With