I was wondering if someone out there could help me with a regex in C#. I think it's fairly simple but I've been wracking my brain over it and not quite sure why I'm having such a hard time. :)
I've found a few examples around but I can't seem to manipulate them to do what I need.
I just need to match ANY alphanumeric+dashes subdomain string that is not "www", and just up to the "."
Also, ideally, if someone were to type "www.subdomain.domain.com" I would like the www to be ignored if possible. If not, it's not a huge issue.
In other words, I would like to match:
And I don't want to match:
It seems to me like it should be easy, but I'm having troubles with the "not match" part.
For what it's worth, this is for use in the IIS 7 URL Rewrite Module, to rewrite for all non-www subdomains.
Thanks!
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
The valid domain name must satisfy the following conditions: The domain name should be a-z or A-Z or 0-9 and hyphen (-). The domain name should be between 1 and 63 characters long. The domain name should not start or end with a hyphen(-) (e.g. -geeksforgeeks.org or geeksforgeeks.org-).
In C#, Regular Expression is a pattern which is used to parse and check whether the given input text is matching with the given pattern or not. In C#, Regular Expressions are generally termed as C# Regex. The . Net Framework provides a regular expression engine that allows the pattern matching.
Is the remainder of the domain name constant, like .domain.com
, as in your examples? Try this:
\b(?!www\.)(\w+(?:-\w+)*)(?=\.domain\.com\b)
Explanation:
\w+(?:-\w+)*
matches a generic domain-name component as you described (but a little more rigorously).
(?=\.domain\.com\b)
makes sure it's the first subdomain (i.e., the last one before the actual domain name).
\b(?!www\.)
makes sure it isn't www.
(without the \b
, it could skip over the first w
and match just the ww.
).
In my tests, this regex matches precisely the parts you highlighted in your examples, and does not match the www.
in either of the last two examples.
EDIT: Here's another version which matches the whole name, capturing the pieces in different groups:
^((?:\w+(?:-\w+)*\.)*)((?!www\.)\w+(?:-\w+)*)(\.domain\.com)$
In most cases, group $1
will contain an empty string because there's nothing before the subdomain name, but here's how it breaks down www.subdomain.domain.com
:
$1: "www."
$2: "subdomain"
$3: ".domain.com"
^www\.
And invert the logic for this bit, so if it matches, then your string does not meet your requirements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With