How would I go about checking if a URL exists using Ruby?
For example, for the URL
https://google.com the result should be truthy, but for the URLs
https://no.such.domain or
https://stackoverflow.com/no/such/path the result should be falsey
Use the Net::HTTP library.
require "net/http" url = URI.parse("http://www.google.com/") req = Net::HTTP.new(url.host, url.port) res = req.request_head(url.path) At this point res is a Net::HTTPResponse object containing the result of the request. You can then check the response code:
do_something_with_it(url) if res.code == "200" Note: To check for https based url, use_ssl attribute should be true as:
require "net/http" url = URI.parse("https://www.google.com/") req = Net::HTTP.new(url.host, url.port) req.use_ssl = true res = req.request_head(url.path)
Sorry for the late reply on this, but I think this deserves a better answer.
There are three ways to look at this question:
While 200 means that the server answers to that URL (thus, the URL exists), answering other status code doesn't means that the URL does not exist. For example, answering 302 - redirected means that the URL exists and is redirecting to another one. While browsing, 302 many times behaves the same than 200 to the final user. Other status code that can be returned if a URL exists is 500 - internal server error. After all, if the URL does not exists, how it comes the application server processed your request instead return simply 404 - not found?
So there are actually only two cases when a URL does not exist: When the server does not exist or when the server exists but can't find the given URL path does not exist. Thus, the only way to check if the URL exists is checking if the server answers and the return code is not 404. The following code does just that.
require "net/http" def url_exist?(url_string) url = URI.parse(url_string) req = Net::HTTP.new(url.host, url.port) req.use_ssl = (url.scheme == 'https') path = url.path if url.path.present? res = req.request_head(path || '/') res.code != "404" # false if returns 404 - not found rescue Errno::ENOENT false # false if can't find the server end However, most of the times we are not interested in see if a URL exists, but if we can access it. Fortunately looking to the HTTP status codes families, that is the 4xx family, which states for client error (thus, an error in your side, which means you are not requesting the page correctly, don't have permission or whatsoever). This is a good of errors to check if you can access this page. From wiki:
The 4xx class of status code is intended for cases in which the client seems to have erred. Except when responding to a HEAD request, the server should include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition. These status codes are applicable to any request method. User agents should display any included entity to the user.
So the following code make sure the URL exists and you can access it:
require "net/http" def url_exist?(url_string) url = URI.parse(url_string) req = Net::HTTP.new(url.host, url.port) req.use_ssl = (url.scheme == 'https') path = url.path if url.path.present? res = req.request_head(path || '/') if res.kind_of?(Net::HTTPRedirection) url_exist?(res['location']) # Go after any redirect and make sure you can access the redirected URL else res.code[0] != "4" #false if http code starts with 4 - error on your side. end rescue Errno::ENOENT false #false if can't find the server end Just like the 4xx family checks if you can access the URL, the 5xx family checks if the server had any problem answering your request. An error on this family most of the times are due problems on the server itself, and hopefully they are working on solve it. If You need to be able to access the page and get a correct answer now, you should make sure the answer is not from 4xx or 5xx family, and if you was redirected, the redirected page answers correctly. So much similar to (2), you can simply use the following code:
require "net/http" def url_exist?(url_string) url = URI.parse(url_string) req = Net::HTTP.new(url.host, url.port) req.use_ssl = (url.scheme == 'https') path = url.path if url.path.present? res = req.request_head(path || '/') if res.kind_of?(Net::HTTPRedirection) url_exist?(res['location']) # Go after any redirect and make sure you can access the redirected URL else ! %W(4 5).include?(res.code[0]) # Not from 4xx or 5xx families end rescue Errno::ENOENT false #false if can't find the server end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With