Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concatenate multiple input sources into a single IO object in Ruby

Tags:

io

ruby

I have a collection of input sources -- strings, files, etc. -- that I want to concatenate and pass to an API that expects to read from a single IO object. The files can be quite large (~10 GB), so reading them into memory and concatenating them into a single string isn't an option. (I also considered using IO.pipe, but spinning up extra threads or processes seems like overkill.)

Is there an existing library class for this in Ruby, cf. Java's SequenceInputStream? If not, is there some other way to do it straightforwardly and idiomatically?

like image 727
David Moles Avatar asked Feb 14 '26 11:02

David Moles


1 Answers

Unfortunately it's writing to a socket with IO.copy_stream

For IO::copy_stream(src, ...) to work, the ยซ IO-like object for src should have readpartial or read method ยป

So, let's try to create a class that can read over a sequence of IO objects; here's the spec of IO#read:

read(maxlen = nil) โ†’ string or nil
read(maxlen = nil, out_string) โ†’ out_string or nil

Reads bytes from the stream (in binary mode):

  • If maxlen is nil, reads all bytes.
  • Otherwise reads maxlen bytes, if available.
  • Otherwise reads all bytes.

Returns a string (either a new string or the given out_string) containing the bytes read. The encoding of the string depends on both maxLen and out_string:

  • maxlen is nil: uses internal encoding of self (regardless of whether out_string was given).
  • maxlen not nil:
    • out_string given: encoding of out_string not modified.
    • out_string not given: ASCII-8BIT is used.
class ConcatIO

  def initialize(*io)
    @array = io
    @index = 0
  end

  def read(maxlen = nil, out_string = (maxlen.nil? ? "" : String.new))
    out_string.clear
    if maxlen.nil?
      if @index < @array.count
        @array[@index..-1].each{|io| out_string.concat(io.read)}
        @index = @array.count
      end
    elsif maxlen >= 0
      while out_string.bytesize < maxlen && @index < @array.count
        bytes = @array[@index].read(maxlen - out_string.bytesize)
        if bytes.nil?
          @index += 1
        else
          out_string.concat(bytes)
        end
      end
      return nil unless out_string.bytesize
    end
    out_string
  end

end

note: The code is inaccurate in regard to the encoding part of the spec.

Now let's use this class with IO::copy_stream:

require 'stringio'

io1 = StringIO.new(  "1")
io2 = StringIO.new( "22")
io3 = StringIO.new("333")
ioN = StringIO.new( "\n")

catio = ConcatIO.new(io1,io2,io3,ioN)

print catio.read(2), "\n"
IO.copy_stream(catio,STDOUT)

And it works!

12
2333

Aside

In fact there's a multi_io gem for concatenating multiple IO sources into a single IO object; the problem is that its methods don't follow the specs of the IO class; for ex. you can't use it with IO::copy_stream, it doesn't work.

Additionally, even if you're able to use ARGF (ie. you're only handling input files stored in ARGV), you still have to be cautious: there are slight differences between some of ARGF's and IO's methods, so it's not 100% safe to feed ARGF to an API that needs to read from an IO object.


Conclusion

Because there's no gem nor core class for it, the only sensible work-around would be to determine the IO methods that the API requires and write a class that implements them. It isn't so straining, as long as you don't have to implement the whole IO interface. Furthermore, you already have a working read method in my answer ๐Ÿ˜‰.

like image 190
Fravadona Avatar answered Feb 17 '26 03:02

Fravadona



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!