I have a URL pattern that needs to contain either APPLES or ORANGES in it, no other value. Optionally, it can also have query parameters. I've tried a number of RegEx patterns, but I just can't get a pattern that will respect the strict match.
Sample URLs
Good
http://www.website.com/en/pages/APPLES
http://www.website.com/en/pages/APPLES?k=v
http://www.website.com/en/pages/ORANGES?k=v&k2=v2
http://www.website.com/en/pages/ORANGES
Bad
http://www.website.com/en/pages/APPLES???k=v
http://www.website.com/en/pages/APPLES?k=v=v
http://www.website.com/en/pages/APPLESORANGES
http://www.website.com/en/pages/1APPLES
http://www.website.com/en/APPLES
Attempted RegEx Patterns (well, at least the best attempts)
(http://*.*.website*.*.com/*.*/pages(/APPLES)|(/ORANGES)[\?]*.*)
(http://*.*.website*.*.com/*.*/pages(/APPLES|/ORANGES)[\?]*.*)
If you're curious, I intentionally want to allow any sub-domain, suffix after "website" (for different environments), and any path between .com/ and /pages, hence the use of . in a number of places.
What would be the best way to achieve this?
**Edit: Final Answer**
My final answer was merged from mathematical.coffee and fardjad.
^https?://.*\.website\b.*\.com/.*/pages/(APPLES\b|ORANGES\b)((\?\w+=\w+)(&?\w+=\w+)*)?$
The single limitation I've discovered is that it will not allow a few valid characters (.~_-%+) in the query string parameter key=value pairs (see: http://en.wikipedia.org/wiki/Query_string#Structure). This isn't an issue for me as I'm matching against a string returned from .NET's Uri class, so I know the URL is well-formed overall.
I think the *.* should be .*:
http://.*\.website\b.*\.com/.*/pages/PAGE[12](\?[^=]+=[^&=]+(&[^=]+=[^=&]+)*)?
Explanation:
http:// # just http://
.*\. # any thing, just make sure it's followed by '.'
website\b # website, the whole word
.*\.com # anything between website and .com
/.*/pages/ # anything between the .com and the pages
PAGE[12] # PAGE1 or PAGE2
(\? # opening bracket and '?' (query string)
[^=]+ # the key: i've said it can't include =
= # =
[^=&]+ # the value: i've said it can't include = or &
(& # opening bracket and '&' for next part of query string
[^=]+=[^=&]+ # key=value pair, same regex as before
)* # 0 or more of these (the &key=value)
)? # the entire query string is optional.
NOTE - there are usually problems parsing query strings with regex and making sure it's a syntactically valid regex.
For example, in the regex I supplied above, I've said that the value in &key=value can't have an ampersand in it. But it could be an escaped entity, like &, which is legal.
You'll always suffer from this sort of problem when you try to parse syntax with regex. It's a risk you'll have to take.
Alternatively, I am sure there is a C# module to parse URLs (many other languages have these), and they take care of all these special cases for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With