I've been trying to figure out what the best way to validate a URL is (specifically in Python) but haven't really been able to find an answer. It seems like there isn't one known way to validate a URL, and it depends on what URLs you think you may need to validate. As well, I found it difficult to find an easy to read standard for URL structure. I did find the RFCs 3986 and 3987, but they contain much more than just how it is structured.
Am I missing something, or is there no one standard way to validate a URL?
To check whether the string entered is a valid URL or not we use the validators module in Python. When we pass the string to the method url() present in the module it returns true(if the string is URL) and ValidationFailure(func=url, …) if URL is invalid.
You can use the URLConstructor to check if a string is a valid URL. URLConstructor ( new URL(url) ) returns a newly created URL object defined by the URL parameters. A JavaScript TypeError exception is thrown if the given URL is not valid.
Using django 's URLValidator One such utility is the validators module, which contains, amongst other things, an URL validator. You can validate if a string is, or not, an URL by creating an instance of URLValidator and calling it.
validator is legit website or scam website. URL checker is a free tool to detect malicious URLs including malware, scam and phishing links. Safe link checker scan URLs for malware, viruses, scam and phishing links.
The original question is a bit old, but you might also want to look at the Validator-Collection library I released a few months back. It includes high-performing regex-based validation of URLs for compliance against the RFC standard. Some details:
re
module)It's also very easy to use:
from validator_collection import validators, checkers
checkers.is_url('http://www.stackoverflow.com')
# Returns True
checkers.is_url('not a valid url')
# Returns False
value = validators.url('http://www.stackoverflow.com')
# value set to 'http://www.stackoverflow.com'
value = validators.url('not a valid url')
# raises a validator_collection.errors.InvalidURLError (which is a ValueError)
value = validators.url('https://123.12.34.56:1234')
# value set to 'https://123.12.34.56:1234'
value = validators.url('http://10.0.0.1')
# raises a validator_collection.errors.InvalidURLError (which is a ValueError)
value = validators.url('http://10.0.0.1', allow_special_ips = True)
# value set to 'http://10.0.0.1'
In addition, Validator-Collection includes about 60+ other validators, including IP addresses (IPv4 and IPv6), domains, and email addresses as well, so something folks might find useful.
This looks like it might be a duplicate of How do you validate a URL with a regular expression in Python?
You should be able to use the urlparse
library described there.
>>> from urllib.parse import urlparse # python2: from urlparse import urlparse
>>> urlparse('actually not a url')
ParseResult(scheme='', netloc='', path='actually not a url', params='', query='', fragment='')
>>> urlparse('http://google.com')
ParseResult(scheme='http', netloc='google.com', path='', params='', query='', fragment='')
call urlparse
on the string you want to check and then make sure that the ParseResult
has attributes for scheme
and netloc
I would use the validators package. Here is the link to the documentation and installation instructions.
It is just as simple as
import validators
url = 'YOUR URL'
validators.url(url)
It will return true if it is, and false if not.
you can also try using urllib.request
to validate by passing the URL in the urlopen
function and catching the exception for URLError
.
from urllib.request import urlopen, URLError
def validate_web_url(url="http://google"):
try:
urlopen(url)
return True
except URLError:
return False
This would return False
in this case
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With