Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does IEnumerable differ from IObservable under the hood?

I'm curious as to how IEnumerable differs from IObservable under the hood. I understand the pull and push patterns respectively but how does C#, in terms of memory etc, notify subscribers (for IObservable) that it should receive the next bit of data in memory to process? How does the observed instance know it's had a change in data to push to the subscribers.

My question comes from a test I was performing reading in lines from a file. The file was about 6Mb in total.

Standard Time Taken: 4.7s, lines: 36587

Rx Time Taken: 0.68s, lines: 36587

How is Rx able to massively improve a normal iteration over each of the lines in the file?

private static void ReadStandardFile()
{
    var timer = Stopwatch.StartNew();
    var linesProcessed = 0;

    foreach (var l in ReadLines(new FileStream(_filePath, FileMode.Open)))
    {
        var s = l.Split(',');
        linesProcessed++;
    }

    timer.Stop();

    _log.DebugFormat("Standard Time Taken: {0}s, lines: {1}",
        timer.Elapsed.ToString(), linesProcessed);
}

private static void ReadRxFile()
{
    var timer = Stopwatch.StartNew();
    var linesProcessed = 0;

    var query = ReadLines(new FileStream(_filePath, FileMode.Open)).ToObservable();

    using (query.Subscribe((line) =>
    {
        var s = line.Split(',');
        linesProcessed++;
    }));

    timer.Stop();

    _log.DebugFormat("Rx Time Taken: {0}s, lines: {1}",
        timer.Elapsed.ToString(), linesProcessed);
}

private static IEnumerable<string> ReadLines(Stream stream)
{
    using (StreamReader reader = new StreamReader(stream))
    {
        while (!reader.EndOfStream)
            yield return reader.ReadLine();
    }
}
like image 321
David Avatar asked Mar 16 '12 17:03

David


2 Answers

My hunch is the behavior you're seeing is reflecting the OS caching the file. I would imagine if you reversed the order of the calls you would see a similar difference in speeds, just swapped.

You could improve this benchmark by performing a few warm-up runs or by copying the input file to a temp file using File.Copy prior to testing each one. This way the file would not be "hot" and you would get a fair comparison.

like image 51
user7116 Avatar answered Sep 28 '22 01:09

user7116


I'd suspect that you're seeing some kind of internal optimization of the CLR. It probably caches the content of the file in memory between the two calls so that ToObservable can pull the content much faster...

Edit: Oh, the good colleague with the crazy nickname eeh ... @sixlettervariables was faster and he's probably right: it's rather the OS who's optimizing than the CLR.

like image 43
Paul Michalik Avatar answered Sep 28 '22 02:09

Paul Michalik