I'm curious as to how IEnumerable
differs from IObservable
under the hood. I understand the pull and push patterns respectively but how does C#, in terms of memory etc, notify subscribers (for IObservable) that it should receive the next bit of data in memory to process? How does the observed instance know it's had a change in data to push to the subscribers.
My question comes from a test I was performing reading in lines from a file. The file was about 6Mb in total.
Standard Time Taken: 4.7s, lines: 36587
Rx Time Taken: 0.68s, lines: 36587
How is Rx able to massively improve a normal iteration over each of the lines in the file?
private static void ReadStandardFile()
{
var timer = Stopwatch.StartNew();
var linesProcessed = 0;
foreach (var l in ReadLines(new FileStream(_filePath, FileMode.Open)))
{
var s = l.Split(',');
linesProcessed++;
}
timer.Stop();
_log.DebugFormat("Standard Time Taken: {0}s, lines: {1}",
timer.Elapsed.ToString(), linesProcessed);
}
private static void ReadRxFile()
{
var timer = Stopwatch.StartNew();
var linesProcessed = 0;
var query = ReadLines(new FileStream(_filePath, FileMode.Open)).ToObservable();
using (query.Subscribe((line) =>
{
var s = line.Split(',');
linesProcessed++;
}));
timer.Stop();
_log.DebugFormat("Rx Time Taken: {0}s, lines: {1}",
timer.Elapsed.ToString(), linesProcessed);
}
private static IEnumerable<string> ReadLines(Stream stream)
{
using (StreamReader reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
}
My hunch is the behavior you're seeing is reflecting the OS caching the file. I would imagine if you reversed the order of the calls you would see a similar difference in speeds, just swapped.
You could improve this benchmark by performing a few warm-up runs or by copying the input file to a temp file using File.Copy
prior to testing each one. This way the file would not be "hot" and you would get a fair comparison.
I'd suspect that you're seeing some kind of internal optimization of the CLR. It probably caches the content of the file in memory between the two calls so that ToObservable
can pull the content much faster...
Edit: Oh, the good colleague with the crazy nickname eeh ... @sixlettervariables was faster and he's probably right: it's rather the OS who's optimizing than the CLR.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With