To check whether the string entered is a valid URL or not we use the validators module in Python. When we pass the string to the method url() present in the module it returns true(if the string is URL) and ValidationFailure(func=url, …) if URL is invalid.
You can use the URLConstructor to check if a string is a valid URL. URLConstructor ( new URL(url) ) returns a newly created URL object defined by the URL parameters. A JavaScript TypeError exception is thrown if the given URL is not valid.
Urllib package is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols.
To find the URLs in a given string we have used the findall() function from the regular expression module of Python. This return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.
Use the validators package:
>>> import validators
>>> validators.url("http://google.com")
True
>>> validators.url("http://google")
ValidationFailure(func=url, args={'value': 'http://google', 'require_tld': True})
>>> if not validators.url("http://google"):
... print "not valid"
...
not valid
>>>
Install it from PyPI with pip (pip install validators
).
Actually, I think this is the best way.
from django.core.validators import URLValidator
from django.core.exceptions import ValidationError
val = URLValidator(verify_exists=False)
try:
val('http://www.google.com')
except ValidationError, e:
print e
If you set verify_exists
to True
, it will actually verify that the URL exists, otherwise it will just check if it's formed correctly.
edit: ah yeah, this question is a duplicate of this: How can I check if a URL exists with Django’s validators?
import re
regex = re.compile(
r'^(?:http|ftp)s?://' # http:// or https://
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...
r'localhost|' #localhost...
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
r'(?::\d+)?' # optional port
r'(?:/?|[/?]\S+)$', re.IGNORECASE)
print(re.match(regex, "http://www.example.com") is not None) # True
print(re.match(regex, "example.com") is not None) # False
A True or False version, based on @DMfll answer:
try:
# python2
from urlparse import urlparse
except:
# python3
from urllib.parse import urlparse
a = 'http://www.cwi.nl:80/%7Eguido/Python.html'
b = '/data/Python.html'
c = 532
d = u'dkakasdkjdjakdjadjfalskdjfalk'
e = 'https://stackoverflow.com'
def uri_validator(x):
try:
result = urlparse(x)
return all([result.scheme, result.netloc])
except:
return False
print(uri_validator(a))
print(uri_validator(b))
print(uri_validator(c))
print(uri_validator(d))
print(uri_validator(e))
Gives:
True
False
False
False
True
Nowadays, I use the following, based on the Padam's answer:
$ python --version
Python 3.6.5
And this is how it looks:
from urllib.parse import urlparse
def is_url(url):
try:
result = urlparse(url)
return all([result.scheme, result.netloc])
except ValueError:
return False
Just use is_url("http://www.asdf.com")
.
Hope it helps!
I landed on this page trying to figure out a sane way to validate strings as "valid" urls. I share here my solution using python3. No extra libraries required.
See https://docs.python.org/2/library/urlparse.html if you are using python2.
See https://docs.python.org/3.0/library/urllib.parse.html if you are using python3 as I am.
import urllib
from pprint import pprint
invalid_url = 'dkakasdkjdjakdjadjfalskdjfalk'
valid_url = 'https://stackoverflow.com'
tokens = [urllib.parse.urlparse(url) for url in (invalid_url, valid_url)]
for token in tokens:
pprint(token)
min_attributes = ('scheme', 'netloc') # add attrs to your liking
for token in tokens:
if not all([getattr(token, attr) for attr in min_attributes]):
error = "'{url}' string has no scheme or netloc.".format(url=token.geturl())
print(error)
else:
print("'{url}' is probably a valid url.".format(url=token.geturl()))
ParseResult(scheme='', netloc='', path='dkakasdkjdjakdjadjfalskdjfalk', params='', query='', fragment='')
ParseResult(scheme='https', netloc='stackoverflow.com', path='', params='', query='', fragment='')
'dkakasdkjdjakdjadjfalskdjfalk' string has no scheme or netloc.
'https://stackoverflow.com' is probably a valid url.
Here is a more concise function:
from urllib.parse import urlparse
min_attributes = ('scheme', 'netloc')
def is_valid(url, qualifying=min_attributes):
tokens = urlparse(url)
return all([getattr(tokens, qualifying_attr)
for qualifying_attr in qualifying])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With