Why is F#'s Seq.sortBy much slower than LINQ's IEnumerable<T>.OrderBy extension method?

Tags:

I've recently written a piece of code to read some data from a file, store it in a tuple and sort all the collected data by the first element of the tuple. After some tests I've noticed that using Seq.sortBy (and Array.sortBy) is extremely slower than using IEnumerable.OrderBy. Below are two snippets of code which should show the behaviour I'm talking about:


(filename
|> File.ReadAllLines
|> Array.Parallel.map(fun ln -> let arr = ln.Split([|' '|], StringSplitOptions.RemoveEmptyEntries) 
                                |> Array.map(double) 
                                |> Array.sort in arr.[0], arr.[1])
).OrderBy(new Func(fun (a,b) -> a))

and


filename
|> File.ReadAllLines
|> Array.Parallel.map(fun ln -> let arr = ln.Split([|' '|], StringSplitOptions.RemoveEmptyEntries) |> Array.map(double) |> Array.sort in arr.[0], arr.[1])
|> Seq.sortBy(fun (a,_) -> a)

On a file containing 100000 lines made of two doubles, on my computer the latter version takes over twice as long as the first one (no improvements are obtained if using Array.sortBy). Ideas?

472

asked Jul 02 '09 08:07

em70

1 Answers

the f# implementation uses a structural comparison of the resulting key.

let sortBy keyf seq =
    let comparer = ComparisonIdentity.Structural
    mkDelayedSeq (fun () -> 
        (seq 
        |> to_list 
        |> List.sortWith (fun x y -> comparer.Compare(keyf x,keyf y)) 
        |> to_array) :> seq<_>)

(also sort)

let sort seq =
    mkDelayedSeq (fun () -> 
        (seq 
        |> to_list 
        |> List.sortWith Operators.compare 
        |> to_array) :> seq<_>)

both Operators.compare and the ComparisonIdentity.Structural.Compare become (eventually)

let inline GenericComparisonFast<'T> (x:'T) (y:'T) : int = 
    GenericComparisonIntrinsic x y
        // lots of other types elided
        when 'T : float = if (# "clt" x y : bool #) 
                          then (-1) 
                          else (# "cgt" x y : int #)

but the route to this for the Operator is entirely inline, thus the JIT compiler will end up inserting a direct double comparison instruction with no additional method invocation overhead except for the (required in both cases anyway) delegate invocation.

The sortBy uses a comparer so will go through an additional virtual method call but is basically about the same.

In comparison the OrderBy function also must go through virtual method calls for the equality (Using EqualityComparer<T>.Default) but the significant difference is that it sorts in place and uses the buffer created for this as the result. In comparison if you take a look at the sortBy you will see that it sorts the list (not in place, it uses the StableSortImplementation which appears to be merge sort) and then creates a copy of it as a new array. This additional copy (given the size of your input data) is likely the principle cause of the slow down though the differing sort implementations may also have an effect.

That said this is all guessing. If this area is a concern for you in performance terms then you should simply profile to find out what is taking the time.

If you wish to see what effect the sorting/copying change would have try this alternate:

// these are taken from the f# source so as to be consistent
// beware doing this, the compiler may know about such methods
open System.Collections.Generic
let mkSeq f = 
    { new IEnumerable<'b> with 
        member x.GetEnumerator() = f()
      interface System.Collections.IEnumerable with 
        member x.GetEnumerator() = (f() :> System.Collections.IEnumerator) }

let mkDelayedSeq (f: unit -> IEnumerable<'T>) = 
    mkSeq (fun () -> f().GetEnumerator())

// the function
let sortByFaster keyf seq =
    let comparer = ComparisonIdentity.Structural
    mkDelayedSeq (fun () -> 
        let buffer = Seq.to_array seq
        Array.sortInPlaceBy (fun x y -> comparer.Compare(keyf x,keyf y)) buffer
        buffer :> seq<_>)

I get some reasonable percentage speedups within the repl with very large (> million) input sequences but nothing like an order of magnitude. Your mileage, as always, may vary.

161

answered Sep 20 '22 15:09

ShuggyCoUk

Related questions
                            
                                How to use a custom Comparer to sort an Array in a different lexical order?
                            
                                Combining two different ActiveRecord collections into one
                            
                                Realm results sort on multiple properties ios
                            
                                How to sort a list of strings with a different order?
                            
                                How (if possible) to sort a BTreeMap by value in Rust?
                            
                                Alphanumeric sorting in Python and negative numbers
                            
                                How to perform sort in js?
                            
                                Sort array of ranges ['55-66', '>55', '<66']?
                            
                                Python sorting multidimensional dict by a specific column
                            
                                Keeping track of original indicies when sorting a list of lists by length
                            
                                mongodb sorting children of an array in a document [duplicate]
                            
                                C++ Sorting set using custom comparator in a container class
                            
                                How to order huge ( GB sized ) CSV file?
                            
                                force linux sort to use lexicographic order
                            
                                Private Sorting Rule in a Stream Java
                            
                                how to return the order index of each element of a list? [duplicate]
                            
                                How to order a list by a custom function, discarding duplicates?
                            
                                Why does Python's sorted() method not reverse orders of keys with the same value in a dictionary?
                            
                                How to sort a tensor by first dimension in pytorch?
                            
                                How to find duplicates in a List<T> quickly, and update the original collection

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is F#'s Seq.sortBy much slower than LINQ's IEnumerable<T>.OrderBy extension method?

Tags:

sorting

inline

f#

em70

People also ask

1 Answers

ShuggyCoUk

Recent Activity

Donate For Us