Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Validate URL query string with regex

I'm trying to validate a query string with regex. Note that I'm not trying to match out the values, but validate its syntax. I'm doing this to practice regex, so I'd appreciate help rather than "use this lib", although seing how it may have been done in a lib would help me, so show me if you've got one.

So, this would be the prerequisites:

  • It must start with a questionmark.
  • It may contain keys with or without values separated by an equals-sign, pairs separated by ampersand.

I've got pretty far, but I'm having trouble matching in regex that the equals-sign and ampersand must be in a certain order without having to repeat match groups. This is what I've got so far:

#^\?([\w\-]+((&|=)([\w\-]+)*)*)?$#

It correctly matches ?abc=123&def=345, but it also incorrectly matches for example ?abc=123=456.

I could go overkill and do something like...

/^\?([\w\-]+=?([\w\-]+)?(&[\w\-]+(=?[\w\-]*)?)*)?$/

... but I don't want to repeat the match groups which are the same anyway.

How can I tell regex that the separators between values must iterate between & and = without repeating match groups or catastrophic back tracking?

Thank you.

Edit:

I'd like to clarify that this is not meant for a real-world implementation; for that, the built-in library in your language, which is most likely available should be used. This question is asked because I want to improve my regex skills, and parsing a query string seemed like a rewarding challenge.

like image 867
Helge Talvik Söderström Avatar asked May 30 '14 16:05

Helge Talvik Söderström


People also ask

How do I check if a URL is valid in RegEx?

Match the given URL with the regular expression. In Java, this can be done by using Pattern. matcher(). Return true if the URL matches with the given regular expression, else return false.

How do I validate a string URL?

You can use the URLConstructor to check if a string is a valid URL. URLConstructor ( new URL(url) ) returns a newly created URL object defined by the URL parameters. A JavaScript TypeError exception is thrown if the given URL is not valid.

How do you validate a query string?

Query string values can be checked using regular expressions. You can select regular expressions from the global White list or enter them manually. For example, if you know that a query string must have a value of ABCD , a regular expression of ^ABCD$ is an exact match test.

Can we use RegEx in URL?

URL regular expressions can be used to verify if a string has a valid URL format as well as to extract an URL from a string.


3 Answers

You can use this regex:

^\?([^=]+=[^=]+&)+[^=]+(=[^=]+)?$

What it does is:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  \?                       '?'
--------------------------------------------------------------------------------
  (                        group and capture to \1 (1 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [^=]+                    any character except: '=' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    =                        '='
--------------------------------------------------------------------------------
    [^=]+                    any character except: '=' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    &                        '&'
--------------------------------------------------------------------------------
  )+                       end of \1 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \1)
--------------------------------------------------------------------------------
  [^=]+                    any character except: '=' (1 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to \2 (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    =                        '='
--------------------------------------------------------------------------------
    [^=]+                    any character except: '=' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )?                       end of \2 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \2)
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
like image 165
Amit Joki Avatar answered Sep 27 '22 23:09

Amit Joki


This seems to be what you want:

^\?([\w-]+(=[\w-]*)?(&[\w-]+(=[\w-]*)?)*)?$

See live demo

This considers each "pair" as a key followed by an optional value (which maybe blank), and has a first pair, followed by an optional & then another pair,and the whole expression (except for the leading?) is optional. Doing it this way prevents matching ?&abc=def

Also note that hyphen doesn't need escaping when last in the character class, allowing a slight simplification.

You seem to want to allow hyphens anywhere in keys or values. If keys need to be hyphen free:

^\?(\w+(=[\w-]*)?(&\w+(=[\w-]*)?)*)?$
like image 20
Bohemian Avatar answered Sep 27 '22 23:09

Bohemian


I agree with Andy Lester, but a possible regex solution is

#^\?([\w-]+=[\w-]*(&[\w-]+=[\w-]*))?$#

which is very much like what you posted.

I haven't tested it and you didn't say what language you're using so it may need a little tweaking.

like image 31
ooga Avatar answered Sep 27 '22 23:09

ooga