Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it safe to validate a URL with a regexp?

In my web app I've got a form field where the user can enter an URL. I'm already doing some preliminary client-side validation and I was wondering if I could use a regexp to validate if the entered string is a valid URL. So, two questions:

  1. Is it safe to do this with a regexp? A URL is a complex beast, and just like you shouldn't use a regexp for parsing HTML, I'm worried that it might be unsuitable for a URL as well.
  2. If it can be done, what would be a good regexp for the task? (I know that Google turns up countless regexps, but I'm worried about their quality).

My goal is to prevent a situation where the URL appears in the web page and is unusable by the browser.

like image 562
Vilx- Avatar asked Jun 17 '10 00:06

Vilx-


People also ask

Can we use RegEx in URL?

URL regular expressions can be used to verify if a string has a valid URL format as well as to extract an URL from a string.

Is RegEx used for validation?

Regex's (also known as Regular Expressions) are sequences of characters that define a search pattern in text. They can be used to validate text based on complex criteria, and match common text patterns like phone numbers and IP addresses.


1 Answers

Well... maybe. People often ask a similar question about email addresses, and with those you would need a horrendously complicated regular expression (i.e. a couple pages long, at least) to correctly validate them. I don't think URLs are quite as complicated (the W3C has a document describing their format) but still, any reasonably short regexp you come up with will probably block some valid URLs.

I would suggest thinking about what kinds of URLs you need to be accepting. Maybe for your purposes, blocking the occasional valid-but-weird submission is fine, and in that case you can use a simple regex that matches most URLs, like the one in Dobiatowski's answer. Or you could use a regex that accepts all valid URLs and a few invalid ones, if that works for you. But I'd be wary of trying to find a regular expression that accepts exactly all valid URLs and no invalid ones. If you want to have 100% foolproof verification in that way, I'd suggest using a client-side validation of the second type I mentioned (that accepts a few invalid URLs) and doing a more comprehensive check on the server side, using some library in whatever language you are using to process the form data.

like image 70
David Z Avatar answered Nov 06 '22 03:11

David Z