Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Validating URLs in Python

I've been trying to figure out what the best way to validate a URL is (specifically in Python) but haven't really been able to find an answer. It seems like there isn't one known way to validate a URL, and it depends on what URLs you think you may need to validate. As well, I found it difficult to find an easy to read standard for URL structure. I did find the RFCs 3986 and 3987, but they contain much more than just how it is structured.

Am I missing something, or is there no one standard way to validate a URL?

like image 681
mp94 Avatar asked Mar 06 '14 23:03

mp94


People also ask

How do you validate a URL in Python?

To check whether the string entered is a valid URL or not we use the validators module in Python. When we pass the string to the method url() present in the module it returns true(if the string is URL) and ValidationFailure(func=url, …) if URL is invalid.

How do I check if a URL is valid?

You can use the URLConstructor to check if a string is a valid URL. URLConstructor ( new URL(url) ) returns a newly created URL object defined by the URL parameters. A JavaScript TypeError exception is thrown if the given URL is not valid.

How can I check if a URL is valid in Django?

Using django 's URLValidator One such utility is the validators module, which contains, amongst other things, an URL validator. You can validate if a string is, or not, an URL by creating an instance of URLValidator and calling it.

What is a URL validator?

validator is legit website or scam website. URL checker is a free tool to detect malicious URLs including malware, scam and phishing links. Safe link checker scan URLs for malware, viruses, scam and phishing links.


4 Answers

The original question is a bit old, but you might also want to look at the Validator-Collection library I released a few months back. It includes high-performing regex-based validation of URLs for compliance against the RFC standard. Some details:

  • Tested against Python 2.7, 3.4, 3.5, 3.6, 3.7, and 3.8
  • No dependencies on Python 3.x, one conditional dependency in Python 2.x (drop-in replacement for Python 2.x's buggy re module)
  • Unit tests that cover 100+ different succeeding/failing URL patterns, including non-standard characters and the like. As close to covering the whole spectrum of the RFC standard as I've been able to find.

It's also very easy to use:

from validator_collection import validators, checkers

checkers.is_url('http://www.stackoverflow.com')
# Returns True

checkers.is_url('not a valid url')
# Returns False

value = validators.url('http://www.stackoverflow.com')
# value set to 'http://www.stackoverflow.com'

value = validators.url('not a valid url')
# raises a validator_collection.errors.InvalidURLError (which is a ValueError)

value = validators.url('https://123.12.34.56:1234')
# value set to 'https://123.12.34.56:1234'

value = validators.url('http://10.0.0.1')
# raises a validator_collection.errors.InvalidURLError (which is a ValueError)

value = validators.url('http://10.0.0.1', allow_special_ips = True)
# value set to 'http://10.0.0.1'

In addition, Validator-Collection includes about 60+ other validators, including IP addresses (IPv4 and IPv6), domains, and email addresses as well, so something folks might find useful.

like image 23
Chris Modzelewski Avatar answered Oct 17 '22 11:10

Chris Modzelewski


This looks like it might be a duplicate of How do you validate a URL with a regular expression in Python?

You should be able to use the urlparse library described there.

>>> from urllib.parse import urlparse # python2: from urlparse import urlparse
>>> urlparse('actually not a url')
ParseResult(scheme='', netloc='', path='actually not a url', params='', query='', fragment='')
>>> urlparse('http://google.com')
ParseResult(scheme='http', netloc='google.com', path='', params='', query='', fragment='')

call urlparse on the string you want to check and then make sure that the ParseResult has attributes for scheme and netloc

like image 74
bgschiller Avatar answered Oct 17 '22 12:10

bgschiller


I would use the validators package. Here is the link to the documentation and installation instructions.

It is just as simple as

import validators
url = 'YOUR URL'
validators.url(url)

It will return true if it is, and false if not.

like image 34
Tony Hammack Avatar answered Oct 17 '22 11:10

Tony Hammack


you can also try using urllib.request to validate by passing the URL in the urlopen function and catching the exception for URLError.

from urllib.request import urlopen, URLError

def validate_web_url(url="http://google"):
    try:
        urlopen(url)
        return True
    except URLError:
        return False

This would return False in this case

like image 1
Hamza Avatar answered Oct 17 '22 12:10

Hamza