I have two CSV files stored on S3. When I open one of them, a File is returned. When I open the other one, a StringIO is returned.
fn1 #=> "http://SOMEWHERE.s3.amazonaws.com/setup_data/d1/file1.csv"
open(fn1) #=> #<File:/var/folders/sm/k7kyd0ns4k9bhfy7yqpjl2mh0000gn/T/open-uri20140814-26070-11cyjn1>
fn2 #=> "http://SOMEWHERE.s3.amazonaws.com/setup_data/d2/d3/file2.csv"
open(fn2) #=> #<StringIO:0x007f9718670ff0>
Why? Is there any way to open them with a consistent data type?
I need to pass the same data type String into CSV.read(open(file_url)), which doesn't work if sometimes it's getting a File and sometimes a StringIO.
They were created via different ruby scripts (they contain very different data).
On my Mac, they both appear to be ordinary text CSV files, and they were uplaoded via the AWS console, and have identical permissions and identical meta data (content-type: application/octet-stream).
This is by design. A tempfile is created if the size of the object is greater than 10240 bytes. From the source:
StringMax = 10240
def <<(str)
@io << str
@size += str.length
if StringIO === @io && StringMax < @size
require 'tempfile'
io = Tempfile.new('open-uri')
io.binmode
Meta.init io, @io if Meta === @io
io << @io.string
@io = io
end
end
If you need a StringIO object, you could use fastercsv.
CSV::read expects a file path as it’s argument, not an already opened IO object. It will then open the file and read the contents. Your code works for the Tempfile case because Ruby calls to_path behind the scenes on anything passed to File::open, and Files respond to this method. What happens is CSV opens another IO on the same file.
Rather than use CSV::read, you could create a new CSV object and call read on that (the instance method, not the class method). CSV:new handles IO objects correctly:
CSV.new(open(file_url)).read
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With