How would I go about checking if a URL exists using Ruby?
For example, for the URL
https://google.com
the result should be truthy, but for the URLs
https://no.such.domain
or
https://stackoverflow.com/no/such/path
the result should be falsey
Use the Net::HTTP library.
require "net/http" url = URI.parse("http://www.google.com/") req = Net::HTTP.new(url.host, url.port) res = req.request_head(url.path)
At this point res
is a Net::HTTPResponse object containing the result of the request. You can then check the response code:
do_something_with_it(url) if res.code == "200"
Note: To check for https
based url, use_ssl
attribute should be true
as:
require "net/http" url = URI.parse("https://www.google.com/") req = Net::HTTP.new(url.host, url.port) req.use_ssl = true res = req.request_head(url.path)
Sorry for the late reply on this, but I think this deserves a better answer.
There are three ways to look at this question:
While 200
means that the server answers to that URL (thus, the URL exists), answering other status code doesn't means that the URL does not exist. For example, answering 302 - redirected
means that the URL exists and is redirecting to another one. While browsing, 302
many times behaves the same than 200
to the final user. Other status code that can be returned if a URL exists is 500 - internal server error
. After all, if the URL does not exists, how it comes the application server processed your request instead return simply 404 - not found
?
So there are actually only two cases when a URL does not exist: When the server does not exist or when the server exists but can't find the given URL path does not exist. Thus, the only way to check if the URL exists is checking if the server answers and the return code is not 404. The following code does just that.
require "net/http" def url_exist?(url_string) url = URI.parse(url_string) req = Net::HTTP.new(url.host, url.port) req.use_ssl = (url.scheme == 'https') path = url.path if url.path.present? res = req.request_head(path || '/') res.code != "404" # false if returns 404 - not found rescue Errno::ENOENT false # false if can't find the server end
However, most of the times we are not interested in see if a URL exists, but if we can access it. Fortunately looking to the HTTP status codes families, that is the 4xx
family, which states for client error (thus, an error in your side, which means you are not requesting the page correctly, don't have permission or whatsoever). This is a good of errors to check if you can access this page. From wiki:
The 4xx class of status code is intended for cases in which the client seems to have erred. Except when responding to a HEAD request, the server should include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition. These status codes are applicable to any request method. User agents should display any included entity to the user.
So the following code make sure the URL exists and you can access it:
require "net/http" def url_exist?(url_string) url = URI.parse(url_string) req = Net::HTTP.new(url.host, url.port) req.use_ssl = (url.scheme == 'https') path = url.path if url.path.present? res = req.request_head(path || '/') if res.kind_of?(Net::HTTPRedirection) url_exist?(res['location']) # Go after any redirect and make sure you can access the redirected URL else res.code[0] != "4" #false if http code starts with 4 - error on your side. end rescue Errno::ENOENT false #false if can't find the server end
Just like the 4xx
family checks if you can access the URL, the 5xx
family checks if the server had any problem answering your request. An error on this family most of the times are due problems on the server itself, and hopefully they are working on solve it. If You need to be able to access the page and get a correct answer now, you should make sure the answer is not from 4xx
or 5xx
family, and if you was redirected, the redirected page answers correctly. So much similar to (2), you can simply use the following code:
require "net/http" def url_exist?(url_string) url = URI.parse(url_string) req = Net::HTTP.new(url.host, url.port) req.use_ssl = (url.scheme == 'https') path = url.path if url.path.present? res = req.request_head(path || '/') if res.kind_of?(Net::HTTPRedirection) url_exist?(res['location']) # Go after any redirect and make sure you can access the redirected URL else ! %W(4 5).include?(res.code[0]) # Not from 4xx or 5xx families end rescue Errno::ENOENT false #false if can't find the server end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With