I really don't understand regex and I also can't find any regex rule to validate culture codes as: en-GB, en-UK, az-AZ-Cyrl, others. How can I validate these codes with a regular expression?

You can validate with this : <pre class="prettyprint"><code>/^[a-z]{2,3}(?:-[A-Z]{2,3}(?:-[a-zA-Z]{4})?)?$/ </code></pre> Here is how it works <pre class="prettyprint"><code>^ <- Starts with [a-z] <- From a to z (lower-case) {2,3} <- Repeated at least 2 times, at most 3 (?: <- Non capturing group - <- The "-" character [A-Z] <- From a to z (upper-case) {2,3} <- Repeated at least 2 times, at most 3 (?: <- Non capturing group - <- The "-" character [a-zA-Z] <- from a to Z (case insensitive) {4} <- Repeated 4 times ) <- End of the group ? <- Facultative ) <- End of the group ? <- Facultative $ <- Ends here </code></pre> You can also replace the last non capturing group by <code>(?:-(?:Cyrl|Latn))?</code> if the only options are Cyrl and Latn

This is what I found in the Dublin Core / W3C xsd's : http://www.w3.org/2001/XMLSchema <pre class="prettyprint"><code> <xs:simpleType name="language" id="language"> <xs:annotation> <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#language"/> </xs:annotation> <xs:restriction base="xs:token"> <xs:pattern value="[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*" id="language.pattern"> <xs:annotation> <xs:documentation source="http://www.ietf.org/rfc/rfc3066.txt"> pattern specifies the content of section 2.12 of XML 1.0e2 and RFC 3066 (Revised version of RFC 1766). </xs:documentation> </xs:annotation> </xs:pattern> </xs:restriction> </xs:simpleType> </code></pre> Then the pattern is : <pre class="prettyprint"><code>[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})* </code></pre>

How can I validate a culture code with a regular expression?

3 Answers

You can validate with this :

/^[a-z]{2,3}(?:-[A-Z]{2,3}(?:-[a-zA-Z]{4})?)?$/

Here is how it works

^       <- Starts with
[a-z]   <- From a to z (lower-case)
{2,3}   <- Repeated at least 2 times, at most 3
(?:     <- Non capturing group
   -        <- The "-" character
   [A-Z]     <- From a to z (upper-case)
   {2,3}     <- Repeated at least 2 times, at most 3
   (?:       <- Non capturing group
       -         <- The "-" character
       [a-zA-Z]  <- from a to Z (case insensitive)
       {4}      <- Repeated 4 times
   )         <- End of the group
   ?         <- Facultative
 )       <- End of the group
 ?       <- Facultative
 $       <- Ends here

You can also replace the last non capturing group by (?:-(?:Cyrl|Latn))? if the only options are Cyrl and Latn

answered Oct 24 '22 01:10

Colin Hebert

This is what I found in the Dublin Core / W3C xsd's : http://www.w3.org/2001/XMLSchema

  <xs:simpleType name="language" id="language"> 
    <xs:annotation> 
      <xs:documentation 
        source="http://www.w3.org/TR/xmlschema-2/#language"/> 
    </xs:annotation> 
    <xs:restriction base="xs:token"> 
      <xs:pattern 
        value="[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*"
                id="language.pattern"> 
        <xs:annotation> 
          <xs:documentation 
                source="http://www.ietf.org/rfc/rfc3066.txt"> 
            pattern specifies the content of section 2.12 of XML 1.0e2
            and RFC 3066 (Revised version of RFC 1766).
          </xs:documentation> 
        </xs:annotation> 
      </xs:pattern> 
    </xs:restriction> 
  </xs:simpleType>

Then the pattern is :

[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*

answered Oct 24 '22 01:10

Patrick Ferreira

According https://en.wikipedia.org/wiki/IETF_language_tag the regexp can be:

/^[a-z]{2,3}(?:-[a-zA-Z]{4})?(?:-[A-Z]{2,3})?$/

From wiki:

a single primary language subtag based on a two-letter language code from ISO 639-1 (2002) or a three-letter code from ISO 639-2 (1998), ISO 639-3 (2007) or ISO 639-5 (2008), or registered through the BCP 47 process and composed of five to eight letters;

an optional script subtag, based on a four-letter script code from ISO 15924 (usually written in title case);

an optional region subtag based on a two-letter country code from ISO 3166-1 alpha-2 (usually written in upper case), or a three-digit code from UN M.49 for geographical regions;

answered Oct 24 '22 01:10

Stepan Seliuk

Related questions
                            
                                Whats the difference between sed -E and sed -e
                            
                                How to escape plus sign on mac os x (BSD) sed?
                            
                                Python regex match literal asterisk
                            
                                Can I break my Perl regex into multiple lines in my code?
                            
                                Regular expression using negative lookbehind not working in Notepad++
                            
                                Divide string by line break or period with Python regular expressions
                            
                                Optional dot in regex
                            
                                Parsing formatted string
                            
                                regex in vimscript
                            
                                How to return only named groups with preg_match or preg_match_all?
                            
                                UPPERCASE, lowercase, Capitalize an Ant property
                            
                                Ruby regex: replace non-word chars that are not space chars
                            
                                Powershell regex group replacing
                            
                                how to use grep to match with either whitespace or newline
                            
                                nginx - redirect a certain path to another domain
                            
                                Replace/Remove characters that do not match the Regular Expression (.NET)
                            
                                Using regex in a string for strpos()
                            
                                How to match question mark "?" as regexp on nginx.conf location
                            
                                How do you use Notepad++ regex pipe | for strings longer than one character?
                            
                                Writing a regex in LaTeX

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I validate a culture code with a regular expression?

Tags:

regex

SameName69

People also ask

3 Answers

Colin Hebert

Patrick Ferreira

Stepan Seliuk

Recent Activity

Donate For Us