Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String StartsWith() issue with Danish text

Can anyone explain this behaviour?

var culture = new CultureInfo("da-DK");
Thread.CurrentThread.CurrentCulture = culture;
"daab".StartsWith("da"); //false

I know that it can be fixed by specifying StringComparison.InvariantCulture. But I'm just confused by the behavior.

I also know that "aA" and "AA" are not considered the same in a Danish case-insensitive comparision, see http://msdn.microsoft.com/en-us/library/xk2wykcz.aspx. Which explains this

String.Compare("aA", "AA", new CultureInfo("da-DK"), CompareOptions.IgnoreCase) // -1 (not equal)

Is this linked to the behavior of the first code snippet?

like image 977
Matt Warren Avatar asked Jun 12 '26 12:06

Matt Warren


2 Answers

Here a test that illustrates the problem, daab og dåb (same word in old and modern language respectively) means baptism/christening.

public class can_handle_remnant_of_danish_language
{
    [Fact]
    public void daab_start_with_då()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("daab".StartsWith("då")); // Fails
    }

    [Fact]
    public void daab_start_with_da()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("daab".StartsWith("da")); // Fails
    }

    [Fact]
    public void daab_start_with_daa()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("daab".StartsWith("daa")); // Succeeds
    }

    [Fact]
    public void dåb_start_with_daa()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("dåb".StartsWith("daa")); // Fails
    }

    [Fact]
    public void dåb_start_with_da()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("dåb".StartsWith("da")); // Fails
    }

    [Fact]
    public void dåb_start_with_då()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("dåb".StartsWith("då")); // Succeeds
    }
}

All the above tests should be successfull with my understanding of the language, and im danish! I aint got no degree in grammar though. :-)

Seems like a bug to me.

like image 110
Lars Udengaard Avatar answered Jun 15 '26 01:06

Lars Udengaard


Like Nappy said, its a feature of the danish language, where "aa" and "å" is still the same. Danish got another two letters, æ and ø, but I am not sure if they can be written using two letters as well.

I think in the second example "aA" is not changed while "AA" is changed to "Å". Just to confuse things even more, "Aa" is considered equal to "AA" and "aa" only when using case-insensitive comparing.

like image 40
Martin Brenden Avatar answered Jun 15 '26 02:06

Martin Brenden