Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does open("url") sometimes return File sometimes StringIO?

Tags:

file

ruby

I have two CSV files stored on S3. When I open one of them, a File is returned. When I open the other one, a StringIO is returned.

fn1 #=> "http://SOMEWHERE.s3.amazonaws.com/setup_data/d1/file1.csv" 
open(fn1) #=> #<File:/var/folders/sm/k7kyd0ns4k9bhfy7yqpjl2mh0000gn/T/open-uri20140814-26070-11cyjn1> 

fn2 #=> "http://SOMEWHERE.s3.amazonaws.com/setup_data/d2/d3/file2.csv" 
open(fn2) #=> #<StringIO:0x007f9718670ff0> 

Why? Is there any way to open them with a consistent data type?

I need to pass the same data type String into CSV.read(open(file_url)), which doesn't work if sometimes it's getting a File and sometimes a StringIO.

They were created via different ruby scripts (they contain very different data).

On my Mac, they both appear to be ordinary text CSV files, and they were uplaoded via the AWS console, and have identical permissions and identical meta data (content-type: application/octet-stream).

like image 214
jpw Avatar asked Aug 15 '14 05:08

jpw


2 Answers

This is by design. A tempfile is created if the size of the object is greater than 10240 bytes. From the source:

StringMax = 10240
def <<(str)
  @io << str
  @size += str.length
  if StringIO === @io && StringMax < @size
    require 'tempfile'
    io = Tempfile.new('open-uri')
    io.binmode
    Meta.init io, @io if Meta === @io
    io << @io.string
    @io = io
  end
end

If you need a StringIO object, you could use fastercsv.

like image 50
0xdeadbeef Avatar answered Sep 23 '22 15:09

0xdeadbeef


CSV::read expects a file path as it’s argument, not an already opened IO object. It will then open the file and read the contents. Your code works for the Tempfile case because Ruby calls to_path behind the scenes on anything passed to File::open, and Files respond to this method. What happens is CSV opens another IO on the same file.

Rather than use CSV::read, you could create a new CSV object and call read on that (the instance method, not the class method). CSV:new handles IO objects correctly:

CSV.new(open(file_url)).read
like image 20
matt Avatar answered Sep 21 '22 15:09

matt