Why does open("url") sometimes return File sometimes StringIO?

Question

I have two CSV files stored on S3. When I open one of them, a File is returned. When I open the other one, a StringIO is returned.

fn1 #=> "http://SOMEWHERE.s3.amazonaws.com/setup_data/d1/file1.csv" 
open(fn1) #=> #<File:/var/folders/sm/k7kyd0ns4k9bhfy7yqpjl2mh0000gn/T/open-uri20140814-26070-11cyjn1> 

fn2 #=> "http://SOMEWHERE.s3.amazonaws.com/setup_data/d2/d3/file2.csv" 
open(fn2) #=> #<StringIO:0x007f9718670ff0>

Why? Is there any way to open them with a consistent data type?

I need to pass the same data type String into CSV.read(open(file_url)), which doesn't work if sometimes it's getting a File and sometimes a StringIO.

They were created via different ruby scripts (they contain very different data).

On my Mac, they both appear to be ordinary text CSV files, and they were uplaoded via the AWS console, and have identical permissions and identical meta data (content-type: application/octet-stream).

0xdeadbeef · Accepted Answer

This is by design. A tempfile is created if the size of the object is greater than 10240 bytes. From the source:

StringMax = 10240
def <<(str)
  @io << str
  @size += str.length
  if StringIO === @io && StringMax < @size
    require 'tempfile'
    io = Tempfile.new('open-uri')
    io.binmode
    Meta.init io, @io if Meta === @io
    io << @io.string
    @io = io
  end
end

If you need a StringIO object, you could use fastercsv.

matt · Answer

CSV::read expects a file path as it’s argument, not an already opened IO object. It will then open the file and read the contents. Your code works for the Tempfile case because Ruby calls to_path behind the scenes on anything passed to File::open, and Files respond to this method. What happens is CSV opens another IO on the same file.

Rather than use CSV::read, you could create a new CSV object and call read on that (the instance method, not the class method). CSV:new handles IO objects correctly:

CSV.new(open(file_url)).read

Rather than use CSV::read, you could create a new CSV object and call read on that (the instance method, not the class method). CSV:new handles IO objects correctly:

CSV.new(open(file_url)).read

Why does open("url") sometimes return File sometimes StringIO?

Tags:

file

ruby

jpw

2 Answers

0xdeadbeef

matt

Recent Activity

Donate For Us

Why does open("url") sometimes return File sometimes StringIO?

Tags:

file

ruby

jpw

2 Answers

0xdeadbeef

matt

Related questions

Recent Activity

Donate For Us