I really don't understand regex and I also can't find any regex rule to validate culture codes as: en-GB, en-UK, az-AZ-Cyrl, others.
How can I validate these codes with a regular expression?
In C#, Regular Expression is a pattern which is used to parse and check whether the given input text is matching with the given pattern or not. In C#, Regular Expressions are generally termed as C# Regex. The . Net Framework provides a regular expression engine that allows the pattern matching.
Short for regular expression, a regex is a string of text that lets you create patterns that help match, locate, and manage text. Perl is a great example of a programming language that utilizes regular expressions.
You can validate with this :
/^[a-z]{2,3}(?:-[A-Z]{2,3}(?:-[a-zA-Z]{4})?)?$/
Here is how it works
^ <- Starts with
[a-z] <- From a to z (lower-case)
{2,3} <- Repeated at least 2 times, at most 3
(?: <- Non capturing group
- <- The "-" character
[A-Z] <- From a to z (upper-case)
{2,3} <- Repeated at least 2 times, at most 3
(?: <- Non capturing group
- <- The "-" character
[a-zA-Z] <- from a to Z (case insensitive)
{4} <- Repeated 4 times
) <- End of the group
? <- Facultative
) <- End of the group
? <- Facultative
$ <- Ends here
You can also replace the last non capturing group by (?:-(?:Cyrl|Latn))?
if the only options are Cyrl and Latn
This is what I found in the Dublin Core / W3C xsd's : http://www.w3.org/2001/XMLSchema
<xs:simpleType name="language" id="language">
<xs:annotation>
<xs:documentation
source="http://www.w3.org/TR/xmlschema-2/#language"/>
</xs:annotation>
<xs:restriction base="xs:token">
<xs:pattern
value="[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*"
id="language.pattern">
<xs:annotation>
<xs:documentation
source="http://www.ietf.org/rfc/rfc3066.txt">
pattern specifies the content of section 2.12 of XML 1.0e2
and RFC 3066 (Revised version of RFC 1766).
</xs:documentation>
</xs:annotation>
</xs:pattern>
</xs:restriction>
</xs:simpleType>
Then the pattern is :
[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*
According https://en.wikipedia.org/wiki/IETF_language_tag the regexp can be:
/^[a-z]{2,3}(?:-[a-zA-Z]{4})?(?:-[A-Z]{2,3})?$/
From wiki:
a single primary language subtag based on a two-letter language code from ISO 639-1 (2002) or a three-letter code from ISO 639-2 (1998), ISO 639-3 (2007) or ISO 639-5 (2008), or registered through the BCP 47 process and composed of five to eight letters;
an optional script subtag, based on a four-letter script code from ISO 15924 (usually written in title case);
an optional region subtag based on a two-letter country code from ISO 3166-1 alpha-2 (usually written in upper case), or a three-digit code from UN M.49 for geographical regions;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With