I'm trying to validate a query string with regex. Note that I'm not trying to match out the values, but validate its syntax. I'm doing this to practice regex, so I'd appreciate help rather than "use this lib", although seing how it may have been done in a lib would help me, so show me if you've got one.
So, this would be the prerequisites:
I've got pretty far, but I'm having trouble matching in regex that the equals-sign and ampersand must be in a certain order without having to repeat match groups. This is what I've got so far:
#^\?([\w\-]+((&|=)([\w\-]+)*)*)?$#
It correctly matches ?abc=123&def=345
, but it also incorrectly matches for example ?abc=123=456
.
I could go overkill and do something like...
/^\?([\w\-]+=?([\w\-]+)?(&[\w\-]+(=?[\w\-]*)?)*)?$/
... but I don't want to repeat the match groups which are the same anyway.
How can I tell regex that the separators between values must iterate between &
and =
without repeating match groups or catastrophic back tracking?
Thank you.
Edit:
I'd like to clarify that this is not meant for a real-world implementation; for that, the built-in library in your language, which is most likely available should be used. This question is asked because I want to improve my regex skills, and parsing a query string seemed like a rewarding challenge.
Match the given URL with the regular expression. In Java, this can be done by using Pattern. matcher(). Return true if the URL matches with the given regular expression, else return false.
You can use the URLConstructor to check if a string is a valid URL. URLConstructor ( new URL(url) ) returns a newly created URL object defined by the URL parameters. A JavaScript TypeError exception is thrown if the given URL is not valid.
Query string values can be checked using regular expressions. You can select regular expressions from the global White list or enter them manually. For example, if you know that a query string must have a value of ABCD , a regular expression of ^ABCD$ is an exact match test.
URL regular expressions can be used to verify if a string has a valid URL format as well as to extract an URL from a string.
You can use this regex:
^\?([^=]+=[^=]+&)+[^=]+(=[^=]+)?$
What it does is:
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\? '?'
--------------------------------------------------------------------------------
( group and capture to \1 (1 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^=]+ any character except: '=' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
[^=]+ any character except: '=' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
& '&'
--------------------------------------------------------------------------------
)+ end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
[^=]+ any character except: '=' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \2 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
[^=]+ any character except: '=' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
)? end of \2 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \2)
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
This seems to be what you want:
^\?([\w-]+(=[\w-]*)?(&[\w-]+(=[\w-]*)?)*)?$
See live demo
This considers each "pair" as a key followed by an optional value (which maybe blank), and has a first pair, followed by an optional &
then another pair,and the whole expression (except for the leading?
) is optional. Doing it this way prevents matching ?&abc=def
Also note that hyphen doesn't need escaping when last in the character class, allowing a slight simplification.
You seem to want to allow hyphens anywhere in keys or values. If keys need to be hyphen free:
^\?(\w+(=[\w-]*)?(&\w+(=[\w-]*)?)*)?$
I agree with Andy Lester, but a possible regex solution is
#^\?([\w-]+=[\w-]*(&[\w-]+=[\w-]*))?$#
which is very much like what you posted.
I haven't tested it and you didn't say what language you're using so it may need a little tweaking.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With