I have a collection of input sources -- strings, files, etc. -- that I want to concatenate and pass to an API that expects to read from a single IO object. The files can be quite large (~10 GB), so reading them into memory and concatenating them into a single string isn't an option. (I also considered using IO.pipe, but spinning up extra threads or processes seems like overkill.)
Is there an existing library class for this in Ruby, cf. Java's SequenceInputStream? If not, is there some other way to do it straightforwardly and idiomatically?
Unfortunately it's writing to a socket with
IO.copy_stream
For IO::copy_stream(src, ...) to work, the ยซ IO-like object for src should have readpartial or read method ยป
So, let's try to create a class that can read over a sequence of IO objects; here's the spec of IO#read:
read(maxlen = nil) โ string or nil
read(maxlen = nil, out_string) โ out_string or nilReads bytes from the stream (in binary mode):
- If
maxlenisnil, reads all bytes.- Otherwise reads
maxlenbytes, if available.- Otherwise reads all bytes.
Returns a string (either a new string or the given
out_string) containing the bytes read. The encoding of the string depends on bothmaxLenandout_string:
maxlenisnil: uses internal encoding of self (regardless of whether out_string was given).maxlennotnil:
out_stringgiven: encoding ofout_stringnot modified.out_stringnot given: ASCII-8BIT is used.
class ConcatIO
def initialize(*io)
@array = io
@index = 0
end
def read(maxlen = nil, out_string = (maxlen.nil? ? "" : String.new))
out_string.clear
if maxlen.nil?
if @index < @array.count
@array[@index..-1].each{|io| out_string.concat(io.read)}
@index = @array.count
end
elsif maxlen >= 0
while out_string.bytesize < maxlen && @index < @array.count
bytes = @array[@index].read(maxlen - out_string.bytesize)
if bytes.nil?
@index += 1
else
out_string.concat(bytes)
end
end
return nil unless out_string.bytesize
end
out_string
end
end
note: The code is inaccurate in regard to the encoding part of the spec.
Now let's use this class with IO::copy_stream:
require 'stringio'
io1 = StringIO.new( "1")
io2 = StringIO.new( "22")
io3 = StringIO.new("333")
ioN = StringIO.new( "\n")
catio = ConcatIO.new(io1,io2,io3,ioN)
print catio.read(2), "\n"
IO.copy_stream(catio,STDOUT)
And it works!
12
2333
In fact there's a multi_io gem for concatenating multiple IO sources into a single IO object; the problem is that its methods don't follow the specs of the IO class; for ex. you can't use it with IO::copy_stream, it doesn't work.
Additionally, even if you're able to use ARGF (ie. you're only handling input files stored in ARGV), you still have to be cautious: there are slight differences between some of ARGF's and IO's methods, so it's not 100% safe to feed ARGF to an API that needs to read from an IO object.
Because there's no gem nor core class for it, the only sensible work-around would be to determine the IO methods that the API requires and write a class that implements them. It isn't so straining, as long as you don't have to implement the whole IO interface. Furthermore, you already have a working read method in my answer ๐.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With