Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for ANY string except "www"? (subdomain)

I was wondering if someone out there could help me with a regex in C#. I think it's fairly simple but I've been wracking my brain over it and not quite sure why I'm having such a hard time. :)

I've found a few examples around but I can't seem to manipulate them to do what I need.

I just need to match ANY alphanumeric+dashes subdomain string that is not "www", and just up to the "."

Also, ideally, if someone were to type "www.subdomain.domain.com" I would like the www to be ignored if possible. If not, it's not a huge issue.

In other words, I would like to match:

  • (test).domain.com
  • (test2).domain.com
  • (wwwasdf).domain.com
  • (asdfwww).domain.com
  • (w).domain.com
  • (wwwwww).domain.com
  • (asfd-12345-www-bananas).domain.com
  • www.(subdomain).domain.com

And I don't want to match:

  • (www).domain.com

It seems to me like it should be easy, but I'm having troubles with the "not match" part.

For what it's worth, this is for use in the IIS 7 URL Rewrite Module, to rewrite for all non-www subdomains.

Thanks!

like image 503
trnelson Avatar asked Aug 17 '11 20:08

trnelson


People also ask

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

How do I find the regex for a domain name?

The valid domain name must satisfy the following conditions: The domain name should be a-z or A-Z or 0-9 and hyphen (-). The domain name should be between 1 and 63 characters long. The domain name should not start or end with a hyphen(-) (e.g. -geeksforgeeks.org or geeksforgeeks.org-).

What is a regex in C#?

In C#, Regular Expression is a pattern which is used to parse and check whether the given input text is matching with the given pattern or not. In C#, Regular Expressions are generally termed as C# Regex. The . Net Framework provides a regular expression engine that allows the pattern matching.


2 Answers

Is the remainder of the domain name constant, like .domain.com, as in your examples? Try this:

\b(?!www\.)(\w+(?:-\w+)*)(?=\.domain\.com\b)

Explanation:

  • \w+(?:-\w+)* matches a generic domain-name component as you described (but a little more rigorously).

  • (?=\.domain\.com\b) makes sure it's the first subdomain (i.e., the last one before the actual domain name).

  • \b(?!www\.) makes sure it isn't www. (without the \b, it could skip over the first w and match just the ww.).

In my tests, this regex matches precisely the parts you highlighted in your examples, and does not match the www. in either of the last two examples.


EDIT: Here's another version which matches the whole name, capturing the pieces in different groups:

^((?:\w+(?:-\w+)*\.)*)((?!www\.)\w+(?:-\w+)*)(\.domain\.com)$

In most cases, group $1 will contain an empty string because there's nothing before the subdomain name, but here's how it breaks down www.subdomain.domain.com:

$1: "www."
$2: "subdomain"
$3: ".domain.com"
like image 185
Alan Moore Avatar answered Sep 17 '22 23:09

Alan Moore


^www\.

And invert the logic for this bit, so if it matches, then your string does not meet your requirements.

like image 28
mopsled Avatar answered Sep 19 '22 23:09

mopsled