Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I validate a culture code with a regular expression?

Tags:

regex

I really don't understand regex and I also can't find any regex rule to validate culture codes as: en-GB, en-UK, az-AZ-Cyrl, others.

How can I validate these codes with a regular expression?

like image 964
SameName69 Avatar asked Oct 18 '10 19:10

SameName69


People also ask

What is Regex in asp net c#?

In C#, Regular Expression is a pattern which is used to parse and check whether the given input text is matching with the given pattern or not. In C#, Regular Expressions are generally termed as C# Regex. The . Net Framework provides a regular expression engine that allows the pattern matching.

What is regex used for?

Short for regular expression, a regex is a string of text that lets you create patterns that help match, locate, and manage text. Perl is a great example of a programming language that utilizes regular expressions.


3 Answers

You can validate with this :

/^[a-z]{2,3}(?:-[A-Z]{2,3}(?:-[a-zA-Z]{4})?)?$/

Here is how it works

^       <- Starts with
[a-z]   <- From a to z (lower-case)
{2,3}   <- Repeated at least 2 times, at most 3
(?:     <- Non capturing group
   -        <- The "-" character
   [A-Z]     <- From a to z (upper-case)
   {2,3}     <- Repeated at least 2 times, at most 3
   (?:       <- Non capturing group
       -         <- The "-" character
       [a-zA-Z]  <- from a to Z (case insensitive)
       {4}      <- Repeated 4 times
   )         <- End of the group
   ?         <- Facultative
 )       <- End of the group
 ?       <- Facultative
 $       <- Ends here

You can also replace the last non capturing group by (?:-(?:Cyrl|Latn))? if the only options are Cyrl and Latn

like image 82
Colin Hebert Avatar answered Oct 24 '22 01:10

Colin Hebert


This is what I found in the Dublin Core / W3C xsd's : http://www.w3.org/2001/XMLSchema

  <xs:simpleType name="language" id="language"> 
    <xs:annotation> 
      <xs:documentation 
        source="http://www.w3.org/TR/xmlschema-2/#language"/> 
    </xs:annotation> 
    <xs:restriction base="xs:token"> 
      <xs:pattern 
        value="[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*"
                id="language.pattern"> 
        <xs:annotation> 
          <xs:documentation 
                source="http://www.ietf.org/rfc/rfc3066.txt"> 
            pattern specifies the content of section 2.12 of XML 1.0e2
            and RFC 3066 (Revised version of RFC 1766).
          </xs:documentation> 
        </xs:annotation> 
      </xs:pattern> 
    </xs:restriction> 
  </xs:simpleType>

Then the pattern is :

[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*
like image 33
Patrick Ferreira Avatar answered Oct 24 '22 01:10

Patrick Ferreira


According https://en.wikipedia.org/wiki/IETF_language_tag the regexp can be:

/^[a-z]{2,3}(?:-[a-zA-Z]{4})?(?:-[A-Z]{2,3})?$/

From wiki:

a single primary language subtag based on a two-letter language code from ISO 639-1 (2002) or a three-letter code from ISO 639-2 (1998), ISO 639-3 (2007) or ISO 639-5 (2008), or registered through the BCP 47 process and composed of five to eight letters;

an optional script subtag, based on a four-letter script code from ISO 15924 (usually written in title case);

an optional region subtag based on a two-letter country code from ISO 3166-1 alpha-2 (usually written in upper case), or a three-digit code from UN M.49 for geographical regions;

like image 31
Stepan Seliuk Avatar answered Oct 24 '22 01:10

Stepan Seliuk