Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to validate a url in Python? (Malformed or not)

People also ask

How do you check if a URL is valid or not Python?

To check whether the string entered is a valid URL or not we use the validators module in Python. When we pass the string to the method url() present in the module it returns true(if the string is URL) and ValidationFailure(func=url, …) if URL is invalid.

How do you check if a URL is valid or not?

You can use the URLConstructor to check if a string is a valid URL. URLConstructor ( new URL(url) ) returns a newly created URL object defined by the URL parameters. A JavaScript TypeError exception is thrown if the given URL is not valid.

Is URL a function in Python?

Urllib package is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols.

How do you find the URL in Python?

To find the URLs in a given string we have used the findall() function from the regular expression module of Python. This return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.


Use the validators package:

>>> import validators
>>> validators.url("http://google.com")
True
>>> validators.url("http://google")
ValidationFailure(func=url, args={'value': 'http://google', 'require_tld': True})
>>> if not validators.url("http://google"):
...     print "not valid"
... 
not valid
>>>

Install it from PyPI with pip (pip install validators).


Actually, I think this is the best way.

from django.core.validators import URLValidator
from django.core.exceptions import ValidationError

val = URLValidator(verify_exists=False)
try:
    val('http://www.google.com')
except ValidationError, e:
    print e

If you set verify_exists to True, it will actually verify that the URL exists, otherwise it will just check if it's formed correctly.

edit: ah yeah, this question is a duplicate of this: How can I check if a URL exists with Django’s validators?


django url validation regex (source):

import re
regex = re.compile(
        r'^(?:http|ftp)s?://' # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...
        r'localhost|' #localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
        r'(?::\d+)?' # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)

print(re.match(regex, "http://www.example.com") is not None) # True
print(re.match(regex, "example.com") is not None)            # False

A True or False version, based on @DMfll answer:

try:
    # python2
    from urlparse import urlparse
except:
    # python3
    from urllib.parse import urlparse

a = 'http://www.cwi.nl:80/%7Eguido/Python.html'
b = '/data/Python.html'
c = 532
d = u'dkakasdkjdjakdjadjfalskdjfalk'
e = 'https://stackoverflow.com'

def uri_validator(x):
    try:
        result = urlparse(x)
        return all([result.scheme, result.netloc])
    except:
        return False

print(uri_validator(a))
print(uri_validator(b))
print(uri_validator(c))
print(uri_validator(d))
print(uri_validator(e))

Gives:

True
False
False
False
True

Nowadays, I use the following, based on the Padam's answer:

$ python --version
Python 3.6.5

And this is how it looks:

from urllib.parse import urlparse

def is_url(url):
  try:
    result = urlparse(url)
    return all([result.scheme, result.netloc])
  except ValueError:
    return False

Just use is_url("http://www.asdf.com").

Hope it helps!


I landed on this page trying to figure out a sane way to validate strings as "valid" urls. I share here my solution using python3. No extra libraries required.

See https://docs.python.org/2/library/urlparse.html if you are using python2.

See https://docs.python.org/3.0/library/urllib.parse.html if you are using python3 as I am.

import urllib
from pprint import pprint

invalid_url = 'dkakasdkjdjakdjadjfalskdjfalk'
valid_url = 'https://stackoverflow.com'
tokens = [urllib.parse.urlparse(url) for url in (invalid_url, valid_url)]

for token in tokens:
    pprint(token)

min_attributes = ('scheme', 'netloc')  # add attrs to your liking
for token in tokens:
    if not all([getattr(token, attr) for attr in min_attributes]):
        error = "'{url}' string has no scheme or netloc.".format(url=token.geturl())
        print(error)
    else:
        print("'{url}' is probably a valid url.".format(url=token.geturl()))

ParseResult(scheme='', netloc='', path='dkakasdkjdjakdjadjfalskdjfalk', params='', query='', fragment='')

ParseResult(scheme='https', netloc='stackoverflow.com', path='', params='', query='', fragment='')

'dkakasdkjdjakdjadjfalskdjfalk' string has no scheme or netloc.

'https://stackoverflow.com' is probably a valid url.

Here is a more concise function:

from urllib.parse import urlparse

min_attributes = ('scheme', 'netloc')


def is_valid(url, qualifying=min_attributes):
    tokens = urlparse(url)
    return all([getattr(tokens, qualifying_attr)
                for qualifying_attr in qualifying])