Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CultureInfo and ISO 639-3

I'm searching a way to construct a CultureInfo object from a ISO 639-3 language code. I didn't find anything in the MSDN about that and trying to get it from the list of all cultures didn't work...

CultureInfo cInfo = CultureInfo.GetCultures(CultureTypes.AllCultures)
    .FirstOrDefault(r => String.Equals(r.ThreeLetterISOLanguageName, "CCH",
        StringComparison.CurrentCultureIgnoreCase));

will always return null (note that "CCH" is one language from the ISO-639-3 list).

Any idea is appreciated, thanks !

like image 666
Bidou Avatar asked Jan 10 '14 10:01

Bidou


4 Answers

The MSDN documentation states that CultureInfo objects only have ISO 639-2 three-letter code and ISO 639-1 two-letter code. That means you are going to need a mapping of some kind in order to link your ISO 639-3 code to a specific CultureInfo instance.

This Wikipedia page has the table with the mappings. Maybe you could cut-and-paste into an XML file and use it as an embedded resource in a class library in order to provide the mapping. Or even just define a static Dictionary<string,string> somewhere.

Alternatively, I'm sure there will be a 3rd party library that can do this for you (though I don't know of any off the top of my head).

edit:

I hadn't realised ISO 639-3 codes only sometimes have a mapping to ISO 639-2 codes. The problem here is that the CultureInfo class isn't designed to handle the ISO 639-3 specification, so you may have to find a completely different 3rd party implementation of CultureInfo which will support this - or make it yourself.

like image 159
theyetiman Avatar answered Sep 29 '22 00:09

theyetiman


I had a similar need to convert between ISO 639-2B/T and ISO 639-3 formats. I created a TT4 solution that generates a list of all the 7K+ entries at compile time. I could have used a dictionary instead of a list, but I am searching multiple fields, so not much value.

Download and extract the tab delimited text file from: http://www-01.sil.org/iso639-3/download.asp Copy it to your project path, rename as appropriate.

Create a design time template file: https://msdn.microsoft.com/en-us/library/dd820620.aspx

<#@ template debug="true" hostspecific="true" language="C#" #>
<#@ output extension=".cs" #>
<#@ assembly name="System.Core" #>
<#@ assembly name="Microsoft.VisualBasic.dll" #> 
<#@ import namespace="System.Linq" #>
<#@ import namespace="System.Text" #>
<#@ import namespace="System.Collections.Generic" #>
<#@ import namespace="Microsoft.VisualBasic.FileIO" #>

// Generated code
using System.Collections.Generic;

namespace Foo
{
    // ISO 639-3
    // http://www-01.sil.org/iso639-3/download.asp
    public class ISO_639_3
    {
        // The three-letter 639-3 identifier
        public string Id { get; set; }
        // Equivalent 639-2 identifier of the bibliographic applications code set, if there is one
        public string Part2B { get; set; }
        // Equivalent 639-2 identifier of the terminology applications code set, if there is one
        public string Part2T { get; set; }
        // Equivalent 639-1 identifier, if there is one
        public string Part1 { get; set; }
        // I(ndividual), M(acrolanguage), S(pecial)
        public string Scope { get; set; }
        // A(ncient), C(onstructed), E(xtinct), H(istorical), L(iving), S(pecial)
        public string Language_Type { get; set; }
        // Reference language name
        public string Ref_Name { get; set; }
        // Comment relating to one or more of the columns
        public string Comment { get; set; }

        // Create a list of all known codes
        public static List<ISO_639_3> Create()
        {
            List<ISO_639_3> list = new List<ISO_639_3> {
<# 
    // Setup text parser
    string filename = this.Host.ResolvePath("iso-639-3.tab"); 
    TextFieldParser tfp = new TextFieldParser(filename)
    {
        TextFieldType = FieldType.Delimited,
        Delimiters = new[] { ",", "\t" },
        HasFieldsEnclosedInQuotes = true,
        TrimWhiteSpace = true
    };

    // Read first row as header
    string[] header = tfp.ReadFields();

    // Read rows from file
    // For debugging limit the row count
    //int maxrows = 10;
    int maxrows = int.MaxValue;
    int rowcount = 0;
    string term = "";
    while (!tfp.EndOfData && rowcount < maxrows)
    {
        // Read row of data from the file
        string[] row = tfp.ReadFields();
        rowcount ++;

        // Add "," on all but last line
        term = tfp.EndOfData || rowcount >= maxrows ? "" : ",";

        // Add new item from row data
#>
                new ISO_639_3 { Id = "<#=row[0]#>", Part2B = "<#=row[1]#>", Part2T = "<#=row[2]#>", Part1 = "<#=row[3]#>", Scope = "<#=row[4]#>", Language_Type = "<#=row[5]#>", Ref_Name = "<#=row[6]#>", Comment = "<#=row[7]#>" }<#=term#>
<# 
    } 
#>  
            };
            return list;
        }

    }

}

The generated code will create an initializer for a list with all the languages. This file is big, it slows down editing speed, compilation takes a long time, keep it unloaded unless you need it. Snippet:

public static List<ISO_639_3> Create()
{
    List<ISO_639_3> list = new List<ISO_639_3> {
        new ISO_639_3 { Id = "aaa", Part2B = "", Part2T = "", Part1 = "", Scope = "I", Language_Type = "L", Ref_Name = "Ghotuo", Comment = "" },
        new ISO_639_3 { Id = "aab", Part2B = "", Part2T = "", Part1 = "", Scope = "I", Language_Type = "L", Ref_Name = "Alumu-Tesu", Comment = "" },
        new ISO_639_3 { Id = "aac", Part2B = "", Part2T = "", Part1 = "", Scope = "I", Language_Type = "L", Ref_Name = "Ari", Comment = "" },

Use the generated list to map as needed, e.g.

    public static ISO_639_3 GetISO_639_3(string language)
    {
        // Create list if it does not exist
        if (Program.Default.ISO6393List == null)
        {
            Program.Default.ISO6393List = ISO_639_3.Create();
        }

        // Match the input string type
        ISO_639_3 lang = null;
        if (language.Length > 3 && language.ElementAt(2) == '-')
        {
            // Treat the language as a culture form, e.g. en-us
            CultureInfo cix = new CultureInfo(language);

            // Recursively call using the ISO 639-2 code
            return GetISO_639_3(cix.ThreeLetterISOLanguageName);
        }
        else if (language.Length > 3)
        {
            // Try long form
            lang = Program.Default.ISO6393List.Where(item => item.Ref_Name.Equals(language, StringComparison.OrdinalIgnoreCase)).FirstOrDefault();
            if (lang != null)
                return lang;
        }
        else if (language.Length == 3)
        {

            // Try 639-3
            lang = Program.Default.ISO6393List.Where(item => item.Id.Equals(language, StringComparison.OrdinalIgnoreCase)).FirstOrDefault();
            if (lang != null)
                return lang;

            // Try the 639-2/B
            lang = Program.Default.ISO6393List.Where(item => item.Part2B.Equals(language, StringComparison.OrdinalIgnoreCase)).FirstOrDefault();
            if (lang != null)
                return lang;

            // Try the 639-2/T
            lang = Program.Default.ISO6393List.Where(item => item.Part2T.Equals(language, StringComparison.OrdinalIgnoreCase)).FirstOrDefault();
            if (lang != null)
                return lang;
        }
        else if (language.Length == 2)
        {
            // Try 639-1
            lang = Program.Default.ISO6393List.Where(item => item.Part1.Equals(language, StringComparison.OrdinalIgnoreCase)).FirstOrDefault();
            if (lang != null)
                return lang;
        }

        // Not found
        return lang;
    }
like image 36
PieterV Avatar answered Sep 29 '22 01:09

PieterV


I found myself needed an enum for ISO 639-3. If you don't actually need to map it to CultureInfo then maybe this will help:

http://snipplr.com/view/76196/enum-for-iso-6393-language-codes/

like image 25
Mike Avatar answered Sep 29 '22 01:09

Mike


Please look into C#'s Text templates. (*.tt)

It will allow you to generate the file whenever you resave it in your project:

<#@import namespace="System.Globalization"#>
<#@ output extension=".cs" #>
namespace YourProject.Enum
{
    enum eLanguage
    {
        Unknown,
        <#
        CultureInfo[] cultures = CultureInfo.GetCultures(CultureTypes.AllCultures);
        foreach (var culture in cultures) { #>
        <#= culture.TwoLetterISOLanguageName #>
        <#
        }
        #>
        Other
    }
}
like image 29
Tim Avatar answered Sep 29 '22 01:09

Tim