I want a regex that does one thing if it has 3 instances of .
in the string, and something else if it has more than 3 instances.
for example
aaa.bbb.ccc.ddd // one part of the regex
aaa.bbb.ccc.ddd.eee // the second part of the regex
how do I achieve this in either js
or c#
?
something like
?(\.){4} then THIS else THAT
within the regex...
Update
Ok basically what I'm doing is this:
I want to switch, for any given System.Uri
, to another subdomain in an extension method.
The problem I came across is that my domains are usually of the form http://subdomain.domain.TLD.TLD/more/url
, but sometimes, it can be just http://domain.TLD.TLD/more/url
(which just points to www
)
So this is what I came up with:
public static class UriExtensions
{
private const string TopLevelDomainRegex = @"(\.[^\.]{2,3}|\.[^\.]{2,3}\.[^\.]{2,3})$";
private const string UnspecifiedSubdomainRegex = @"^((http[s]?|ftp):\/\/)(()([^:\/\s]+))(:([^\/]*))?((?:\/)?|(?:\/)(((\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?))?$";
private const string SpecifiedSubdomainRegex = @"^((http[s]?|ftp):\/\/)(([^.:\/\s]*)[\.]([^:\/\s]+))(:([^\/]*))?((?:\/)?|(?:\/)(((\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?))?$";
public static string AbsolutePathToSubdomain(this Uri uri, string subdomain)
{
subdomain = subdomain == "www" ? string.Empty : string.Concat(subdomain, ".");
var replacement = "$1{0}$5$6".FormatWith(subdomain);
var spec = Regex.Replace(uri.Authority, TopLevelDomainRegex, string.Empty).Distinct().Count(c => c == '.') != 0;
return Regex.Replace(uri.AbsoluteUri, spec ? SpecifiedSubdomainRegex : UnspecifiedSubdomainRegex, replacement);
}
}
Basically with this code I take the System.Uri
and:
subdomain.domain.TLD.TLD
using the Authority
property..XX[X]
or .XX[X].XX[X]
)domain
or subdomain.domain
UnspecifiedSubdomainRegex
, because I couldn't figure out how to use SpecifiedSubdomainRegex
to tell it that if it has no dots on that part, it should return string.Empty
My question then is if there is a way to merge these three regexes into something simpler
PD: Forget about javascript, I was just using it to test the regex on the fly
?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
Regular Expressions (Regex) In fact, you can think of regular expressions as wildcards on steroids. A wildcard expression such as *. txt to find all text files in a file manager would become . *\.
You can do this using the (?(?=condition)then|else)
construct. However, this is not available in JavaScript (but it is available in .NET, Perl and PCRE):
^(?(?=(?:[^.]*\.){3}[^.]*$)aaa|eee)
for example, will check if a string contains exactly three dots, and if it does, it tries to match aaa
at the start of the string; otherwise it tries to match eee
. So it will match the first three letters of
aaa.bbb.ccc.ddd
eee.ddd.ccc.bbb.aaa
eee
but fail on
aaa.bbb.ccc
eee.ddd.ccc.bbb
aaa.bbb.ccc.ddd.eee
Explanation:
^ # Start of string
(? # Conditional: If the following lookahead succeeds:
(?= # Positive lookahead - can we match...
(?: # the following group, consisting of
[^.]*\. # 0+ non-dots and 1 dot
){3} # 3 times
[^.]* # followed only by non-dots...
$ # until end-of-string?
) # End of lookahead
aaa # Then try to match aaa
| # else...
eee # try to match eee
) # End of conditional
^(?:[^.]*\.[^.]*){3}$
the regex above will match the string that has exactly 3 dots --- http://rubular.com/r/Tsaemvz1Yi.
^(?:[^.]*\.[^.]*){4,}$
and this one - for the string that has 4 dots or more --- http://rubular.com/r/IJDeQWVhEB
In Python (excuse me; but regexes are without language frontier)
import re
regx = re.compile('^([^.]*?\.){3}[^.]*?\.')
for ss in ("aaa.bbb.ccc",
"aaa.bbb.ccc.ddd",
'aaa.bbb.ccc.ddd.eee',
'a.b.c.d.e.f.g.h.i...'):
if regx.search(ss):
print ss + ' has at least 4 dots in it'
else:
print ss + ' has a maximum of 3 dots in it'
result
aaa.bbb.ccc has a maximum of 3 dots in it
aaa.bbb.ccc.ddd has a maximum of 3 dots in it
aaa.bbb.ccc.ddd.eee has at least 4 dots in it
a.b.c.d.e.f.g.h.i... has at least 4 dots in it
This regex' pattern doesn't require that the entire string be analysed (no symbol $ in it). It's better on long strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With