I'm trying to detect whether a combo box contains an ISO language code (i.e. en-GB, el-GR, ru-RU etc), which comprises of 2 alphabetical characters, a dash, and 2 more alphabetical characters (in upper case, or it might not matter?).
I was wondering, is there a way I can achieve this using regular expressions?
I'm assuming the expression would look something like this (but I don't have much experience in the subject):
string pattern = @"^\a{2,2}-\a{2,2}";
Short answer: yes.
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.
Regex support is part of the standard library of many programming languages, including Java and Python, and is built into the syntax of others, including Perl and ECMAScript. Implementations of regex functionality is often called a regex engine, and a number of libraries are available for reuse.
1.7 Example: Identifiers (or Names) [a-zA-Z_][0-9a-zA-Z_]* or [a-zA-Z_]\w* Begin with one letters or underscore, followed by zero or more digits, letters and underscore. You can use metacharacter \w for a word character [a-zA-Z0-9_] . Recall that metacharacter \d can be used for a digit [0-9] .
Something like so should work: ^[a-z]{2}-[A-Z]{2}$
.
The ^
anchor instructs the regex engine to start matching from the beginning of the string, [a-z]
means any lower case letter between a
and z
. {2}
means exactly 2 repetitions of. The same explanation holds for the rest. Finally, the $
instructs the regex engine to stop matching at the end of the string.
Accepted solution by @npinti could be not accurate enough if we take a closer look to the list of ISO 639x codes here. Alternatively you can get a culture list on your own by invoking the static method below (C# code):
System.Globalization.CultureInfo.GetCultures(CultureTypes.AllCultures);
Among the retrieved values, you will find non matching samples as "Cy-az-AZ" (3 codes!), "zh-CHS" (3 letters!) or "en-029" (numbers!).
Curiously enough, the one with numbers does not appear in the MS link above, even though is retrieved by the CultureInfo
method.
This article from here discusses the one with numbers.
So it doesn't seem an easy issue. We could try with a slightly more complex regex as the one shown below, but this doesn't guarantee that we'll be able to distinct an ISO culture code against whatever other thing. IMO, if we really have the need to be 100% reliable, probably the only choice is to seek that code into the list of codes to find an exact match.
Regex option:
^[^-]{2,3}-[^-]{2,3}(-[^-]{2,3})?$
Find option:
public static bool IsCultureCode(string code)
{
CultureInfo[] cultures = CultureInfo.GetCultures(CultureTypes.SpecificCultures); //AllCultures
int i = 0;
while(i < cultures.Length && !cultures[i].Name.Equals(code, StringComparison.InvariantCultureIgnoreCase))
i++;
return i < cultures.Length;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With