I have the following 3 strings...
a = "The URL is www.google.com"
b = "The URL is google.com"
c = "The URL is http://www.google.com"
Ruby's URI extract method only returns the URL in the third string, because it contains the http part.
URI.extract(a)
=> []
URI.extract(b)
=> []
URI.extract(c)
=> ["http://www.google.com"]
How can I create a method to detect and return the URL in all 3 instances?
Use regular expressions :
Here is a basic one that should work for most cases :
/(https?:\/\/)?\w*\.\w+(\.\w+)*(\/\w+)*(\.\w*)?/.match( a ).to_s
This will only fetch the first url in the string and return a string.
There's no perfect solution to this problem: it's fraught with edge cases. However, you might be able to get tolerably good results using something like the regular expressions used by Twitter to extract URLs from tweets (stripping off the extra leading spaces is left as an exercise!):
require './regex.rb'
def extract_url(s)
s[Twitter::Regex[:valid_url]]
end
a = "The URL is www.google.com"
b = "The URL is google.com"
c = "The URL is http://www.google.com"
extract_url(a)
# => " www.google.com"
extract_url(b)
# => " google.com"
extract_url(c)
# => " http://www.google.com"
You seem to be satisfied with Sucrenoir's answer. The essence of Sucrenoir's answer is to identity a URL by assuming that it includes at least one period. if that is the case, Sucrenoir's regex can be simplified (not equivalently, but for the most part) to this:
string[/\S+\.\S+/]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With