Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match ampersands in a URI that are followed by an equals and not another ampersand

My regex knowledge is escaping me on this one...

Say I have a URL with a URI as a query parameter, ala:

http://hostname.com?uri=http://website.com/company/YoYo+&+Co+Inc&type=company

...assuming our uri param doesn't contain any params itself, I want to manually parse out the query params in Javascript, but obviously the ampersand in our embedded uri param makes it more difficult then simply splitting on all ampersands and running with it from there.

What I really want to do is define a regex that matches only question marks and ampersands that are followed by an equals prior to being followed by another ampersand (or end of line). I came up with this which comes close but is including the non-capturing text as well and I'm not sure why:

[?&](?:[^&]+)=

...that results in a match on ?uri= as well as &type= which is close but capturing more than I want. What am I doing wrong such that it's not capturing just the ? and & in matches? In other words, it should only be capturing the ? prior to uri and the & prior to type.

like image 534
James Avatar asked Aug 01 '13 16:08

James


People also ask

How do you match expressions in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

How do you match a character except in regex?

To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '. ' (period) is a metacharacter (it sometimes has a special meaning).

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).


1 Answers

If I understand correctly and you just want to match the ? or & then your regex should be:

[?&](?==)

Explanation:

[?&] is a set of characters containing just ? and &. Meaning it will look for one of those.

(?= ) This is a positive lookahead. It means "This has to come after the main match but don't include it". So to make it find an = looks kind of funny as (?==)


If you want to include the word "uri" or "type" then add a \w after the character set and before the lookahead:

[?&]\w+(?==)

+ means "match 1 or more"


And just one more in case that's not exactly what you're looking for! If you want to get rid of the &/? but keep the text we'd wrap the character set in a positive lookBEHIND. The syntax for that is (?<= ). That would change the regex to this:

(?<=[?&])\w+(?==)

Example of that at work: http://regexr.com?35q0u


In reponse to comment: You can match just the ? and & by putting the \w+ inside of the positive lookahead:

[?&](?=\w+=)

And because I'm bored and like regexs a bit too much, here's one that will match the value of the tag:

(?<==).*?(?=[&?]\w+=|$)

Example: http://regexr.com?35q11 There's multiple highlighted sections because global matching is on.

like image 140
JDiPierro Avatar answered Oct 04 '22 06:10

JDiPierro