My regex knowledge is escaping me on this one...
Say I have a URL with a URI as a query parameter, ala:
http://hostname.com?uri=http://website.com/company/YoYo+&+Co+Inc&type=company
...assuming our uri param doesn't contain any params itself, I want to manually parse out the query params in Javascript, but obviously the ampersand in our embedded uri param makes it more difficult then simply splitting on all ampersands and running with it from there.
What I really want to do is define a regex that matches only question marks and ampersands that are followed by an equals prior to being followed by another ampersand (or end of line). I came up with this which comes close but is including the non-capturing text as well and I'm not sure why:
[?&](?:[^&]+)=
...that results in a match on ?uri=
as well as &type=
which is close but capturing more than I want. What am I doing wrong such that it's not capturing just the ?
and &
in matches? In other words, it should only be capturing the ?
prior to uri and the &
prior to type.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '. ' (period) is a metacharacter (it sometimes has a special meaning).
$ means "Match the end of the string" (the position after the last character in the string).
If I understand correctly and you just want to match the ? or & then your regex should be:
[?&](?==)
Explanation:
[?&]
is a set of characters containing just ? and &. Meaning it will look for one of those.
(?= )
This is a positive lookahead. It means "This has to come after the main match but don't include it". So to make it find an = looks kind of funny as (?==)
If you want to include the word "uri" or "type" then add a \w
after the character set and before the lookahead:
[?&]\w+(?==)
+
means "match 1 or more"
And just one more in case that's not exactly what you're looking for! If you want to get rid of the &/? but keep the text we'd wrap the character set in a positive lookBEHIND. The syntax for that is (?<= )
. That would change the regex to this:
(?<=[?&])\w+(?==)
Example of that at work: http://regexr.com?35q0u
In reponse to comment: You can match just the ? and & by putting the \w+ inside of the positive lookahead:
[?&](?=\w+=)
And because I'm bored and like regexs a bit too much, here's one that will match the value of the tag:
(?<==).*?(?=[&?]\w+=|$)
Example: http://regexr.com?35q11 There's multiple highlighted sections because global matching is on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With