Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby - How to get the name of a file with open-uri?

I want to download a music file by this way:

require 'open-uri'

source_url = "http://soundcloud.com/stereo-foo/cohete-amigo/download"

attachment_file = "test.wav"

open(attachment_file, "wb") do |file|  
  file.print open(source_url).read
end

In that example I want to change "Test.wav" to the real file name (like for example JDownloader program does).

EDIT: I don't mean the temporal file, I mean the stored file in the web like Jdownloader gets: "Cohete Amigo - Stereo Foo.wav"

Thankyou for read

UPDATE:

I've tried this to store the name:

attachment_file = File.basename(open(source_url))

I think that has no sense but i don't know the way to do it, sorry.

like image 536
ElektroStudios Avatar asked Nov 15 '12 08:11

ElektroStudios


2 Answers

The filename is stored in the header field named Content-Disposition. However decoding this field can be a little bit tricky. See some discussion here for example:

How to encode the filename parameter of Content-Disposition header in HTTP?

For open-uri you can access all the header fields through the meta accessor of the returned File class:

f = open('http://soundcloud.com/stereo-foo/cohete-amigo/download')
f.meta['content-disposition']
=> "attachment;filename=\"Stereo Foo - Cohete Amigo.wav\""

So in order to decode something like that you could do this:

cd = f.meta['content-disposition'].
filename = cd.match(/filename=(\"?)(.+)\1/)[2]
=> "Stereo Foo - Cohete Amigo.wav"

It works for your particular case, and it also works if the quotes " are not present. But in the more complex content-disposition cases like UTF-8 filenames you could get into a little trouble. Not sure how often UTF-8 is used though, and if even soundcloud ever uses UTF-8. So maybe you don't need to worry about that (not confirmed nor tested).

You could also use a more advanced web-crawling framework like Mechanize, and trust it to do the decoding for you:

require 'mechanize'

agent = Mechanize.new
file = agent.get('http://soundcloud.com/stereo-foo/cohete-amigo/download')
file.filename
=> "Stereo_Foo_-_Cohete_Amigo.wav"
like image 161
Casper Avatar answered Nov 13 '22 02:11

Casper


File.basename(open(source_url)) won't work because open(source_url) returns an I/O handle of some sort, not a string like File.basename expects.

File.basename(source_url)

would have a better chance of working, unless the URL is using some path/to/service/with/parameters/in/line/like/this type encoding.

Ruby's URI library has useful tools to help here though. Something like:

File.basename(URI.parse(source_url).path)

would be a starting point. For instance:

require 'uri'

File.basename(URI.parse('http://www.example.com/path/to/file/index.html').path
# => "index.html"

and:

File.basename(URI.parse('http://www.example.com/path/to/file/index.html?foo=bar').path)
# => "index.html"

do you know if I can retreive the filesize too and how?

A great way to test HTTP stuff locally, is to run gem server from the command-line, and let gems fire up a little web server for its documentation:

require 'open-uri'

html_doc = open('http://0.0.0.0:8808/') do |io|
  puts io.size
  io.read
end

puts html_doc.size

# => 114350
# => 114350

When you use a block with OpenURI's open command, it gives you access to a lot of information about the connection in the block variable, which is an instance of the Tempfile class. So, you can find out the size of the incoming file using size.

That's OK for small files, but if you're pulling in a big file you might want to investigate using Net::HTTP to send a head request, which might include the size. I say might, because sometimes the server doesn't know how much will be returned, in the case of dynamic content, or content being returned by a CGI or sub-service that doesn't bother to say.

The advantage to using a "head" request is the server doesn't return the entire content, just the headers. So, in the past, I've prefaced a request using head, to see if I could get the data I needed. If not, I'd be forced to pull in the full response using a normal get.

like image 22
the Tin Man Avatar answered Nov 13 '22 03:11

the Tin Man