Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby Net::HTTP - following 301 redirects

Tags:

ruby

My users submit urls (to mixes on mixcloud.com) and my app uses them to perform web requests.

A good url returns a 200 status code:

uri = URI.parse("http://www.mixcloud.com/ErolAlkan/hard-summer-mix/") request = Net::HTTP.get_response(uri)( #<Net::HTTPOK 200 OK readbody=true> 

But if you forget the trailing slash then our otherwise good url returns a 301:

uri = "http://www.mixcloud.com/ErolAlkan/hard-summer-mix" #<Net::HTTPMovedPermanently 301 MOVED PERMANENTLY readbody=true>  

The same thing happens with 404's:

# bad path returns a 404 "http://www.mixcloud.com/bad/path/"  # bad path minus trailing slash returns a 301 "http://www.mixcloud.com/bad/path" 
  1. How can I 'drill down' into the 301 to see if it takes us on to a valid resource or an error page?
  2. Is there a tool that provides a comprehensive overview of the rules that a particular domain might apply to their urls?
like image 923
stephenmurdoch Avatar asked Aug 26 '11 20:08

stephenmurdoch


People also ask

How do I stop browser 301 redirects?

Google Chrome will cache your 301 redirects. To get around this, and to keep the tabs open, you'll just need to clear your browser cache. > Settings > Show advanced settings... > Privacy > Click Clear browsing data...


1 Answers

301 redirects are fairly common if you do not type the URL exactly as the web server expects it. They happen much more frequently than you'd think, you just don't normally ever notice them while browsing because the browser does all that automatically for you.

Two alternatives come to mind:

1: Use open-uri

open-uri handles redirects automatically. So all you'd need to do is:

require 'open-uri'  ... response = open('http://xyz...').read 

If you have trouble redirecting between HTTP and HTTPS, then have a look here for a solution:
Ruby open-uri redirect forbidden

2: Handle redirects with Net::HTTP

def get_response_with_redirect(uri)    r = Net::HTTP.get_response(uri)    if r.code == "301"      r = Net::HTTP.get_response(URI.parse(r['location']))    end    r end 

If you want to be even smarter you could try to add or remove missing backslashes to the URL when you get a 404 response. You could do that by creating a method like get_response_smart which handles this URL fiddling in addition to the redirects.

like image 57
Casper Avatar answered Oct 04 '22 15:10

Casper