Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regular expression to detect ISO language code

Tags:

c#

regex

winforms

I'm trying to detect whether a combo box contains an ISO language code (i.e. en-GB, el-GR, ru-RU etc), which comprises of 2 alphabetical characters, a dash, and 2 more alphabetical characters (in upper case, or it might not matter?).

I was wondering, is there a way I can achieve this using regular expressions?

I'm assuming the expression would look something like this (but I don't have much experience in the subject):

string pattern = @"^\a{2,2}-\a{2,2}";
like image 664
Themos Avatar asked Mar 14 '13 07:03

Themos


People also ask

Does regex work for other languages?

Short answer: yes.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

What languages use regular expressions?

Regex support is part of the standard library of many programming languages, including Java and Python, and is built into the syntax of others, including Perl and ECMAScript. Implementations of regex functionality is often called a regex engine, and a number of libraries are available for reuse.

What will be a regular expression for an identifier?

1.7 Example: Identifiers (or Names) [a-zA-Z_][0-9a-zA-Z_]* or [a-zA-Z_]\w* Begin with one letters or underscore, followed by zero or more digits, letters and underscore. You can use metacharacter \w for a word character [a-zA-Z0-9_] . Recall that metacharacter \d can be used for a digit [0-9] .


2 Answers

Something like so should work: ^[a-z]{2}-[A-Z]{2}$.

The ^ anchor instructs the regex engine to start matching from the beginning of the string, [a-z] means any lower case letter between a and z. {2} means exactly 2 repetitions of. The same explanation holds for the rest. Finally, the $ instructs the regex engine to stop matching at the end of the string.

like image 75
npinti Avatar answered Sep 28 '22 10:09

npinti


Accepted solution by @npinti could be not accurate enough if we take a closer look to the list of ISO 639x codes here. Alternatively you can get a culture list on your own by invoking the static method below (C# code):

System.Globalization.CultureInfo.GetCultures(CultureTypes.AllCultures);

Among the retrieved values, you will find non matching samples as "Cy-az-AZ" (3 codes!), "zh-CHS" (3 letters!) or "en-029" (numbers!). Curiously enough, the one with numbers does not appear in the MS link above, even though is retrieved by the CultureInfo method.

This article from here discusses the one with numbers.

So it doesn't seem an easy issue. We could try with a slightly more complex regex as the one shown below, but this doesn't guarantee that we'll be able to distinct an ISO culture code against whatever other thing. IMO, if we really have the need to be 100% reliable, probably the only choice is to seek that code into the list of codes to find an exact match.

Regex option:

^[^-]{2,3}-[^-]{2,3}(-[^-]{2,3})?$

Find option:

public static bool IsCultureCode(string code)
{
    CultureInfo[] cultures = CultureInfo.GetCultures(CultureTypes.SpecificCultures); //AllCultures
    int i = 0;
    while(i < cultures.Length && !cultures[i].Name.Equals(code, StringComparison.InvariantCultureIgnoreCase))
        i++;
    return i < cultures.Length;
}
like image 39
Mario Vázquez Avatar answered Sep 28 '22 11:09

Mario Vázquez