Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using wget via Ruby on Rails

I want to build a simple website that can download a webpage www.example.com/index.html and store its snapshot on the server when the client requests. I'm thinking about using the command wget to download the webpage. Would Ruby on Rails be able to handle this task?

like image 642
Paul S. Avatar asked Oct 08 '12 20:10

Paul S.


1 Answers

Yes.

You can perform shell commands in Ruby via back ticks, exec and system. Note that each one returns something slightly different:

  1. back ticks

    `wget http://www.yahoo.com`
    
  2. exec:

    exec('wget http://www.yahoo.com')
    
  3. system:

    system('wget http://www.yahoo.com')
    

This blog post seems to be in the same vein as what you're trying to do.

Additionally, there are several terrific Ruby libraries for doing this:

  1. mechanize with mechanize download - check out this railscast
  2. httparty - simple wrapper around a more-difficult-to-use http library. Once you get the response body, you will need to save it to the database or file.
  3. typhoeus - simple mechanism for making the http requests in parallel, if you need such an ability

They will provide a much better cleaner Ruby interface for dealing with the data that comes back from the various requests.


The best way to test all of these options is to use the Rails console. Go to the root directory of your Rails app and type:

rails c

Once in the console, you can emulate the actual server calls.

Running wget in your console will drop the files in your Rails root directory, which is not what you want. tmp is a standard directory for such things. You can dynamically generate the path based on the URL like so:

# tmp directory
path = Rails.root.join('tmp')
# create sub-directory as md5 hash based on URL
sub_dir = Digest::MD5.hexdigest(url)
# append sub_dir on the path
destination_path = path.join(sub_dir) 
system("wget -P #{destination_path} #{url}")

Be sure to also include the options from this post

like image 54
Matt Dressel Avatar answered Sep 28 '22 06:09

Matt Dressel