Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What regular expression does a browsers use for HTML5 input type=url?

I'm working on a HTML5 input pattern polyfill and I'm trying to validate an input type=url in JavaScript exactly as the browser (Chrome) does but can't find any documentation on a JavaScript or PERL compatible regular expression. As it's a polyfill, I don't particularly mind if it matches all URL's exactly (which is impossible) but rather that it imitates how the browser works.

Would anyone know of an identical pattern in PERL syntax?

Thanks

like image 764
Benjamin Solum Avatar asked May 16 '12 20:05

Benjamin Solum


People also ask

Can you use regular expressions with HTML5 inputs?

Unlike the regular expression syntax used in programming languages, HTML5 does not require '^' and '$' to annotate the beginning and ending of values (it is assumed). Regular expressions can be enforced on inputs by using the “pattern” attribute.

What is the correct regular expression to match a url?

@:%_\+~#= , to match the domain/sub domain name.

What is input type url in HTML?

The <input type="url"> defines a field for entering a URL. The input value is automatically validated before the form can be submitted.

How do you check for regular expressions in HTML?

The pattern attribute specifies a regular expression that the <input> element's value is checked against on form submission. Note: The pattern attribute works with the following input types: text, date, search, url, tel, email, and password. Tip: Use the global title attribute to describe the pattern to help the user.


2 Answers

After searching through several HTML5 shivs on GitHub to see if anyone else has come across an ideal expression, I believe I found something that's very close but it doesn't match perfectly.

Alexander Farkas (https://github.com/aFarkas/webshim/blob/master/src/shims/form-shim-extend.js#L285) uses this pattern to test URLs:

/^([a-z]([a-z]|\d|\+|-|\.)*):(\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*@)?((\[(|(v[\da-f]{1,}\.(([a-z]|\d|-|\.|_|~)|[!\$&'\(\)\*\+,;=]|:)+))\])|((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=])*)(:\d*)?)(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*|(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)?)|((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)|((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)){0})(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?$/i;

Also, just for anyone who stumbles across this via Google, if you don't need the pattern, but just want to check if something's valid through JavaScript (perhaps onChange), you can use the formelement.checkValidity() method. Obviously this doesn't help with a polyfill (which assumes no native HTML5 validation support) but it is useful nonetheless.

like image 52
Benjamin Solum Avatar answered Nov 08 '22 02:11

Benjamin Solum


Read the regarding specification at http://www.w3.org/TR/html5/forms.html#url-state-(type=url):

Your polyfill should start with sanitizing the input, i.e. removing linebreaks and trimming the string. The sentence "User agents must not allow users to insert "LF" (U+000A) or "CR" (U+000D) characters" might also be interesting.

The results should be a valid, absolute URL. The there referenced RFCs 3986 and 3987 will be describing the URL validation, the section about parsing URLs may be as well interesting.

Your polyfill might not only validate URIs, it also may resolve relative URIs. At least, validating a URI will be much simpler with an algortihm instead of finding an appropriate regexp. Yet, even the RFC mentions a regexp for parsing a already validated URI in appendix B.

like image 41
Bergi Avatar answered Nov 08 '22 03:11

Bergi