I'm not a regex expert and I'm breaking my head trying to do one that seems very simple and works in python 2.7: validate the path of an URL (no hostname) without the query string. In other words, a string that starts with /, allows alphanumeric values and doesn't allow any other special chars except these: <code>/</code>, <code>.</code>, <code>-</code> I found this post that is very similar to what I need but for me isn't working at all, I can test with for example <code>aaa</code> and it will return true even if it doesn't start with <code>/</code>. The current regex that I have kinda working is this one: <pre class="prettyprint"><code>[^/+a-zA-Z0-9.-] </code></pre> but it doesn't work with paths that don't start with <code>/</code>. For example: <ul> <li> <code>/aaa</code> -> true, this is ok</li> <li> <code>/aaa/bbb</code> -> true, this is ok</li> <li> <code>/aaa?q=x</code> -> false, this is ok</li> <li> <code>aaa</code> -> true, this is NOT ok</li> </ul>

The regex you've defined is a character class. Instead, try: <pre class="prettyprint"><code>^\/[/.a-zA-Z0-9-]+$ </code></pre>

Regex: validate a URL path with no query params

Tags:

I'm not a regex expert and I'm breaking my head trying to do one that seems very simple and works in python 2.7: validate the path of an URL (no hostname) without the query string. In other words, a string that starts with /, allows alphanumeric values and doesn't allow any other special chars except these: /, ., -

I found this post that is very similar to what I need but for me isn't working at all, I can test with for example aaa and it will return true even if it doesn't start with /.

The current regex that I have kinda working is this one:

[^/+a-zA-Z0-9.-]

but it doesn't work with paths that don't start with /. For example:

/aaa -> true, this is ok
/aaa/bbb -> true, this is ok
/aaa?q=x -> false, this is ok
aaa -> true, this is NOT ok

783

asked Oct 17 '12 06:10

Juancho

2 Answers

The regex you've defined is a character class. Instead, try:

^\/[/.a-zA-Z0-9-]+$

answered Sep 19 '22 05:09

slackwing

In other words, a string that starts with /, allows alphanumeric values and doesn't allow any other special chars except these: /, ., -

You are missing some characters that are valid in URLs

import string
import urllib
import urlparse

valid_chars = string.letters + string.digits + '/.-~'
valid_paths = []

urls = ['http://www.my.uni.edu/info/matriculation/enroling.html',
    'http://info.my.org/AboutUs/Phonebook',
    'http://www.library.my.town.va.us/Catalogue/76523471236%2Fwen44--4.98',
    'http://www.my.org/462F4F2D4241522A314159265358979323846',
        'http://www.myu.edu/org/admin/people#andy',
        'http://www.w3.org/RDB/EMP?*%20where%20name%%3Ddobbins']

for i in urls:
   path = urllib.unquote(urlparse.urlparse(i).path)
   if path[0] == '/' and len([i for i in path if i in valid_chars]) == len(path):
        valid_paths.append(path)

answered Sep 22 '22 05:09

Burhan Khalid

Related questions
                            
                                How do I post unicode characters using httplib?
                            
                                django many-to-many field: prefetch primary keys only
                            
                                python import site failed
                            
                                I have modulus and private exponent. How to construct RSA private key and sign a message?
                            
                                Surf missing in opencv 2.4 for python
                            
                                How does the python difflib.get_close_matches() function work?
                            
                                Running Multiple spiders in scrapy
                            
                                Argparse: expected one argument
                            
                                Events framework for Python? [closed]
                            
                                TypeError:'datetime.datetime' object is not subscriptable
                            
                                Efficiently accumulating a collection of sparse scipy matrices
                            
                                How to run arbitrary code when django shell starts?
                            
                                Flask/Werkzeug debugger, process model, and initialization code
                            
                                Frame buffer module of python
                            
                                Homework - Python Proxy Server
                            
                                Doctest Involving Escape Characters
                            
                                Numpy C API: Link several object files
                            
                                preserve argspec when decorating? [duplicate]
                            
                                ctypes reference double pointer
                            
                                what's the difference between pylint 'disable' and 'disable-msg'?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regex: validate a URL path with no query params

Tags:

python

regex

url

path

Juancho

People also ask

2 Answers

slackwing

Burhan Khalid

Recent Activity

Donate For Us