Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

F# vs. C# performance Signatures with sample code

There are many discussions on this topic already, but I am all about flogging dead horses, particularly when I discover they may still be breathing.

I was working on parsing the unusual and exotic file format that is the CSV, and for fun I decided to characterize the performance against the 2 .net languages I know, C# and F#.

The results were...unsettling. F# won, by a wide margin, a factor of 2 or more(and I actually think it's more like .5n, but getting real benchmarks is proving to be tough since I am testing against hardware IO).

Divergent performance characteristics in something as common as reading a CSV is surprising to me(note that the coefficient means that C# wins on very small files. The more testing I am doing the more it feels like C# scales worse, which is both surprising and concerning, since it probably means I am doing it wrong).

Some notes : Core 2 duo laptop, spindle disk 80 gigs, 3 gigs ddr 800 memory, windows 7 64 bit premium, .Net 4, no power options turned on.

30,000 lines 5 wide 1 phrase 10 chars or less is giving me a factor of 3 in favor of the tail call recursion after the first run(it appears to cache the file)

300,000(same data repeated) is a factor of 2 for the tail call recursion with F#'s mutable implementation winning out slightly, but the performance signatures suggest that I am hitting the disk and not ram-disking the whole file, which causes semi-random performance spikes.

F# code

//Module used to import data from an arbitrary CSV source
module CSVImport
open System.IO

//imports the data froma path into a list of strings and an associated value
let ImportData (path:string) : List<string []> = 

    //recursively rips through the file grabbing a line and adding it to the 
    let rec readline (reader:StreamReader) (lines:List<string []>) : List<string []> =
        let line = reader.ReadLine()
        match line with
        | null -> lines
        | _ -> readline reader  (line.Split(',')::lines)

    //grab a file and open it, then return the parsed data
    use chaosfile = new StreamReader(path)
    readline chaosfile []

//a recreation of the above function using a while loop
let ImportDataWhile (path:string) : list<string []> =
    use chaosfile = new StreamReader(path)
    //values ina loop construct must be mutable
    let mutable retval = []
    //loop
    while chaosfile.EndOfStream <> true do
        retval <- chaosfile.ReadLine().Split(',')::retval 
    //return retval by just declaring it
    retval

let CSVlines (path:string) : string seq= 
    seq { use streamreader = new StreamReader(path)
          while not streamreader.EndOfStream do
            yield streamreader.ReadLine() }

let ImportDataSeq (path:string) : string [] list =
    let mutable retval = []
    let sequencer = CSVlines path
    for line in sequencer do
        retval <- line.Split()::retval
    retval

C# Code

using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Text;

namespace CSVparse
{
    public class CSVprocess
    {
        public static List<string[]> ImportDataC(string path)
        {
            List<string[]> retval = new List<string[]>();
            using(StreamReader readfile = new StreamReader(path))
            {
                string line = readfile.ReadLine();
                while (line != null)
                {
                    retval.Add(line.Split());
                    line = readfile.ReadLine();
                }
            } 

           return retval;
        }

        public static List<string[]> ImportDataReadLines(string path)
        {
            List<string[]> retval = new List<string[]>();
            IEnumerable<string> toparse = File.ReadLines(path);

            foreach (string split in toparse)
            {
                retval.Add(split.Split());
            }
            return retval;
        }
    }

}

Note the variety of implementations there. Using iterators, using sequences, using tail call optimizatons, while loops in 2 languages...

A major issue is that I am hitting the disk, and so some idiosyncracies can be accounted for by that, I intend on rewriting this code to read from a memory stream(which should be more consistent assuming I don't start to swap)

But everything I am taught/read says that while loops/for loops are faster than tail call optimizations/recursion, and every actual benchmark that I run is saying the dead opposite of that.

So I guess my question is, should I question the conventional wisdom?

Is tail call recursion really better than while loops in the .net ecosystem?

How does this work out on Mono?

like image 908
Snark Avatar asked Feb 02 '11 08:02

Snark


3 Answers

I think that the difference may arise from different Lists in F# and C#. F# uses singly linked lists (see http://msdn.microsoft.com/en-us/library/dd233224.aspx) whereas in C# System.Collections.Generic.List ist used, which is based on arrays.

Concatenation is much faster for singly linked lists, especially when you're parsing big files (you need to allocate/copy the whole array list from time to time).

Try using a LinkedList in the C# code, I'm curious about the results :) ...

PS: Also, this would be a good example on when to use a profiler. You could easily find the "hot spot" of the C# code...

EDIT

So, I tried this for myself: I used two identical files in order to prevent caching effects. The files were 3.000.000 lines with 10 times 'abcdef', separated by comma.

The main program looks like this:

static void Main(string[] args) {
   var dt = DateTime.Now;
   CSVprocess.ImportDataC("test.csv"); // C# implementation
   System.Console.WriteLine("Time {0}", DateTime.Now - dt);
   dt = DateTime.Now;
   CSVImport.ImportData("test1.csv"); // F# implementation
   System.Console.WriteLine("Time {0}", DateTime.Now - dt);
}

(I also tried it with first executing the F# implementation and then the C#...)

The result is:

  • C#: 3.7 seconds
  • F#: 7.6 seconds

Running the C# solution after the F# solution gives the same performance for the F# version but 4.7 seconds for C# (I assume due to heavy memory allocation by the F# solution). Running each solution alone doesn't change the above results.

Using a file with 6.000.000 lines gives ~ 7 seconds for the C# solution, the F# solution produces an OutOfMemoryException (I'm running this on a maching with 12GB Ram ...)

So for me it seems that the conventional 'wisdom' is true and C# using a simple loop is faster for this kind of tasks ...

like image 54
MartinStettner Avatar answered Nov 09 '22 18:11

MartinStettner


You really, really, really, really shouldn't be reading anything into these results - either benchmark your entire system as a form of system test, or remove the disk I/O from the benchmark. It's just going to confuse matters. It's probably better practice to take a TextReader parameter rather than a physical path to avoid chaining the implementation to physical files.

Additionally, as a microbenchmark your test has a few other flaws:

  • You define numerous functions that aren't called during the benchmark. Are you testing ImportDataC or ImportDataReadLines? Pick and choose for clarity - and in real applications, don't duplicate implementations, but factor out similarities and define one in terms of the other.
  • You're calling .Split(',') in F# but .Split() in C# - do you intend to split on comma's or on whitespaces?
  • You're reinventing the wheel - at least compare your implementation with the much shorter versions using higher-order functions (aka LINQ).
like image 24
Eamon Nerbonne Avatar answered Nov 09 '22 20:11

Eamon Nerbonne


I note that it looks like your F# is using F# list whereas C# is using .Net List. Might try changing F# to use other list type for more data.

like image 43
Brian Avatar answered Nov 09 '22 20:11

Brian