Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: validate a URL path with no query params

I'm not a regex expert and I'm breaking my head trying to do one that seems very simple and works in python 2.7: validate the path of an URL (no hostname) without the query string. In other words, a string that starts with /, allows alphanumeric values and doesn't allow any other special chars except these: /, ., -

I found this post that is very similar to what I need but for me isn't working at all, I can test with for example aaa and it will return true even if it doesn't start with /.

The current regex that I have kinda working is this one:

[^/+a-zA-Z0-9.-]

but it doesn't work with paths that don't start with /. For example:

  • /aaa -> true, this is ok
  • /aaa/bbb -> true, this is ok
  • /aaa?q=x -> false, this is ok
  • aaa -> true, this is NOT ok
like image 783
Juancho Avatar asked Oct 17 '12 06:10

Juancho


People also ask

How do you check the given URL is valid or not?

You can use the URLConstructor to check if a string is a valid URL. URLConstructor ( new URL(url) ) returns a newly created URL object defined by the URL parameters. A JavaScript TypeError exception is thrown if the given URL is not valid.

Can we use RegEx in URL?

URL regular expressions can be used to verify if a string has a valid URL format as well as to extract an URL from a string.

How do I validate a pattern in RegEx?

To validate a field with a Regex pattern, click the Must match pattern check box. Next, add the expression you want to validate against. Then add the message your users will see if the validation fails. You can save time for your users by including formatting instructions or examples in the question description.


2 Answers

The regex you've defined is a character class. Instead, try:

^\/[/.a-zA-Z0-9-]+$
like image 83
slackwing Avatar answered Sep 19 '22 05:09

slackwing


In other words, a string that starts with /, allows alphanumeric values and doesn't allow any other special chars except these: /, ., -

You are missing some characters that are valid in URLs

import string
import urllib
import urlparse

valid_chars = string.letters + string.digits + '/.-~'
valid_paths = []

urls = ['http://www.my.uni.edu/info/matriculation/enroling.html',
    'http://info.my.org/AboutUs/Phonebook',
    'http://www.library.my.town.va.us/Catalogue/76523471236%2Fwen44--4.98',
    'http://www.my.org/462F4F2D4241522A314159265358979323846',
        'http://www.myu.edu/org/admin/people#andy',
        'http://www.w3.org/RDB/EMP?*%20where%20name%%3Ddobbins']

for i in urls:
   path = urllib.unquote(urlparse.urlparse(i).path)
   if path[0] == '/' and len([i for i in path if i in valid_chars]) == len(path):
        valid_paths.append(path)
like image 33
Burhan Khalid Avatar answered Sep 22 '22 05:09

Burhan Khalid