How to validate a url in Python? (Malformed or not)

People also ask

How do you check if a URL is valid or not Python?

To check whether the string entered is a valid URL or not we use the validators module in Python. When we pass the string to the method url() present in the module it returns true(if the string is URL) and ValidationFailure(func=url, …) if URL is invalid.

How do you check if a URL is valid or not?

You can use the URLConstructor to check if a string is a valid URL. URLConstructor ( new URL(url) ) returns a newly created URL object defined by the URL parameters. A JavaScript TypeError exception is thrown if the given URL is not valid.

Is URL a function in Python?

Urllib package is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols.

How do you find the URL in Python?

To find the URLs in a given string we have used the findall() function from the regular expression module of Python. This return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.

Use the validators package:

>>> import validators
>>> validators.url("http://google.com")
True
>>> validators.url("http://google")
ValidationFailure(func=url, args={'value': 'http://google', 'require_tld': True})
>>> if not validators.url("http://google"):
...     print "not valid"
... 
not valid
>>>

Install it from PyPI with pip (pip install validators).

Actually, I think this is the best way.

from django.core.validators import URLValidator
from django.core.exceptions import ValidationError

val = URLValidator(verify_exists=False)
try:
    val('http://www.google.com')
except ValidationError, e:
    print e

If you set verify_exists to True, it will actually verify that the URL exists, otherwise it will just check if it's formed correctly.

edit: ah yeah, this question is a duplicate of this: How can I check if a URL exists with Django’s validators?

django url validation regex (source):

import re
regex = re.compile(
        r'^(?:http|ftp)s?://' # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...
        r'localhost|' #localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
        r'(?::\d+)?' # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)

print(re.match(regex, "http://www.example.com") is not None) # True
print(re.match(regex, "example.com") is not None)            # False

A True or False version, based on @DMfll answer:

try:
    # python2
    from urlparse import urlparse
except:
    # python3
    from urllib.parse import urlparse

a = 'http://www.cwi.nl:80/%7Eguido/Python.html'
b = '/data/Python.html'
c = 532
d = u'dkakasdkjdjakdjadjfalskdjfalk'
e = 'https://stackoverflow.com'

def uri_validator(x):
    try:
        result = urlparse(x)
        return all([result.scheme, result.netloc])
    except:
        return False

print(uri_validator(a))
print(uri_validator(b))
print(uri_validator(c))
print(uri_validator(d))
print(uri_validator(e))

Gives:

True
False
False
False
True

Nowadays, I use the following, based on the Padam's answer:

$ python --version
Python 3.6.5

And this is how it looks:

from urllib.parse import urlparse

def is_url(url):
  try:
    result = urlparse(url)
    return all([result.scheme, result.netloc])
  except ValueError:
    return False

Just use is_url("http://www.asdf.com").

Hope it helps!

I landed on this page trying to figure out a sane way to validate strings as "valid" urls. I share here my solution using python3. No extra libraries required.

See https://docs.python.org/2/library/urlparse.html if you are using python2.

See https://docs.python.org/3.0/library/urllib.parse.html if you are using python3 as I am.

import urllib
from pprint import pprint

invalid_url = 'dkakasdkjdjakdjadjfalskdjfalk'
valid_url = 'https://stackoverflow.com'
tokens = [urllib.parse.urlparse(url) for url in (invalid_url, valid_url)]

for token in tokens:
    pprint(token)

min_attributes = ('scheme', 'netloc')  # add attrs to your liking
for token in tokens:
    if not all([getattr(token, attr) for attr in min_attributes]):
        error = "'{url}' string has no scheme or netloc.".format(url=token.geturl())
        print(error)
    else:
        print("'{url}' is probably a valid url.".format(url=token.geturl()))

ParseResult(scheme='', netloc='', path='dkakasdkjdjakdjadjfalskdjfalk', params='', query='', fragment='')

ParseResult(scheme='https', netloc='stackoverflow.com', path='', params='', query='', fragment='')

'dkakasdkjdjakdjadjfalskdjfalk' string has no scheme or netloc.

'https://stackoverflow.com' is probably a valid url.

Here is a more concise function:

from urllib.parse import urlparse

min_attributes = ('scheme', 'netloc')


def is_valid(url, qualifying=min_attributes):
    tokens = urlparse(url)
    return all([getattr(tokens, qualifying_attr)
                for qualifying_attr in qualifying])

Related questions
                            
                                Why can't non-default arguments follow default arguments?
                            
                                Split string based on a regular expression
                            
                                Why aren't superclass __init__ methods automatically invoked?
                            
                                Convert timedelta to years?
                            
                                How can I remove a pytz timezone from a datetime object?
                            
                                Combining conda environment.yml with pip requirements.txt
                            
                                TensorFlow, why was python the chosen language?
                            
                                High performance fuzzy string comparison in Python, use Levenshtein or difflib [closed]
                            
                                What does |= (ior) do in Python?
                            
                                When should iteritems() be used instead of items()?
                            
                                In Python, how do I index a list with another list?
                            
                                How do I run a Python program in the Command Prompt in Windows 7?
                            
                                Extract first item of each sublist
                            
                                How to solve SyntaxError on autogenerated manage.py?
                            
                                How to enumerate an object's properties in Python? [duplicate]
                            
                                How to suppress Pandas Future warning ?
                            
                                How to log source file name and line number in Python
                            
                                How to turn off INFO logging in Spark?
                            
                                Why is my Python App Engine app using the Translate API getting an error of ImportError: No module named apiclient.discovery?
                            
                                How to delete the last row of data of a pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to validate a url in Python? (Malformed or not)

Tags:

python

url

malformedurlexception

People also ask

django url validation regex (source):

Recent Activity

Donate For Us