Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Default ordering in C# vs. F#

Consider the two fragments of code that simply order strings in C# and F# respectively:

C#:

var strings = new[] { "Tea and Coffee", "Telephone", "TV" };
var orderedStrings = strings.OrderBy(s => s).ToArray();

F#:

let strings = [| "Tea and Coffee"; "Telephone"; "TV" |]
let orderedStrings =
    strings
    |> Seq.sortBy (fun s -> s)
    |> Seq.toArray

These two fragments of code return different results:

  • C#: Tea and Coffee, Telephone, TV
  • F#: TV, Tea and Coffee, Telephone

In my specific case I need to correlate the ordering logic between these two languages (one is production code, and one is part of a test assertion). This poses a few questions:

  • Is there an underlying reason for the differences in ordering logic?
  • What is the recommended way to overcome this "problem" in my situation?
  • Is this phenomenon specific to strings, or does it apply to other .NET types too?

EDIT

In response to several probing comments, running the fragments below reveals more about the exact nature of the differences of this ordering:

F#:

let strings = [| "UV"; "Uv"; "uV"; "uv"; "Tv"; "TV"; "tv"; "tV" |]
let orderedStrings =
    strings
    |> Seq.sortBy (fun s -> s)
    |> Seq.toArray

C#:

var strings = new[] { "UV", "Uv", "uv", "uV", "TV", "tV", "Tv", "tv" };
var orderedStrings = strings.OrderBy(s => s).ToArray();

Gives:

  • C#: tv, tV, Tv, TV, uv, uV, Uv, UV
  • F#: TV, Tv, UV, Uv, tV, tv, uV, uv

The lexicographic ordering of strings differs because of a difference in the underlying order of characters:

  • C#: "aAbBcCdD...tTuUvV..."
  • F#: "ABC..TUV..Zabc..tuv.."
like image 871
Lawrence Avatar asked Jun 23 '15 09:06

Lawrence


1 Answers

See section 8.15.6 of the language spec.

Strings, arrays, and native integers have special comparison semantics, everything else just goes to IComparable if that's implemented (modulo various optimizations that yield the same result).

In particular, F# strings use ordinal comparison by default, in contrast to most of .NET which uses culture-aware comparison by default.

This is obviously a confusing incompatibility between F# and other .NET languages, however it does have some benefits:

  • OCAML compat
  • String and char comparisons are consistent
    • C# Comparer<string>.Default.Compare("a", "A") // -1
    • C# Comparer<char>.Default.Compare('a', 'A') // 32
    • F# compare "a" "A" // 1
    • F# compare 'a' 'A' // 32

Edit:

Note that it's misleading (though not incorrect) to state that "F# uses case-sensitive string comparison". F# uses ordinal comparison, which is stricter than just case-sensitive.

// case-sensitive comparison
StringComparer.InvariantCulture.Compare("[", "A") // -1
StringComparer.InvariantCulture.Compare("[", "a") // -1

// ordinal comparison
// (recall, '[' lands between upper- and lower-case chars in the ASCII table)
compare "[" "A"  // 26
compare "[" "a"  // -6
like image 80
latkin Avatar answered Oct 11 '22 02:10

latkin