Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are my memory benchmarks giving strange results?

I have recently been running some basic benchmarks written in C# to try to determine the reason why some seemingly identical HyperV remote workstations seem to be running far slower than others. Their results on most of the basic tests that I am running have been totally identical, but the results from a basic memory access benchmark (specifically, time taken to initialise a 2 dimensional 1000x1000 array of doubles to 0) differ by a factor of 40.

To further investigate this issue, I have run several other experiments to further narrow down the issue. Running the same test with an exponentially increasing array size (until an OutOfMemoryException occurs) shows no difference between the various remotes until the array size is over 1m, then an immediate difference of a factor of around 40. In fact, testing incremental array sizes, time taken to initialise increases proportionally to array size until an array size of exactly 999999, then on the'slow' remotes, the time taken increases by 900%, while on the 'fast' remotes it decreases by 70% as the array size reaches 1000x1000. From there, it continues to scale proportionally. The same phenomenon also happens with array sizes of 1m x 1 and 1 x 1m, though to a much smaller extent (changes of +50% and -30% instead.

Interestingly, changing the data type used for the experiment to floats appears to completely eliminate this phenomenon. No difference occurs between the remotes in any test, and the time taken appears to be entirely proportional even over the 1000*1000 and 2000*2000 breakpoints. Another interesting factor is that the local workstation that I am using's behaviour appears to mirror that of the slower remotes.

Does anybody have any idea what settings in the system configuration might be causing this effect and how it might be changed, or what might be done to further debug the issue?

like image 530
Deso Lution Avatar asked Mar 13 '16 18:03

Deso Lution


People also ask

How to check for memory diagnostic results in Windows 10?

In Run dialog box, type eventvwr and hit enter. Now expand Windows Logs and click on System. You can use the filter or the find option in Event Viewer to check for specific logs. You can try the steps mentioned below to filter memory diagnostic results in Event Viewer. Right click on System and click on Find.

How to view memory diagnostic results in Event Viewer?

Now expand Windows Logs and click on System. You can use the filter or the find option in Event Viewer to check for specific logs. You can try the steps mentioned below to filter memory diagnostic results in Event Viewer. Right click on System and click on Find. This should pull up the results for Memory Diagnostic that you have ran.

Where are the results of memory diagnostic stored?

Usually, when the memory diagnostic is complete the results will be stored under Event Viewer. Kindly follow the steps mentioned below to access Event Viewer logs for Memory Diagnostic.


1 Answers

You'll have to keep in mind what you are really testing. Which is most certainly not a .NET program's ability to assign array elements. That is very fast and normally proceeds are memory bus bandrates for a big array, typically ~37 gigabytes/second depending on the kind of RAM the machine has, 5 GB/sec on the pokiest kind you could run into today (slow clocked DDR2 on an old machine).

The new keyword only allocates address space on a demand-paged virtual memory operating system like Windows. Just numbers to the processor, one each for every 4096 bytes.

Once you start assigning elements the first time, the demand-paged feature kicks in and your code forces the operating system to allocate RAM for the array. The array element assignment triggers a page fault, one for each 4096 bytes in the array. Or 512 doubles for your array. The cost of handling the page fault is included in your measurement.

That's smooth sailing only when the OS has a zero-initialized RAM page ready to be used. Usually takes a fat half a microsecond, give or take. Still a lot of a time to a processor, it will be stalled when the OS updates the page mapping. Keep in mind that this only happens on the first element access, subsequent ones are fast since the RAM page will still be available. Usually.

It is not smooth sailing when such a RAM page is not available. Then the OS has to pillage one. There are as many as 4 distinct scenarios in your case that I can think of:

  • a page is available but not yet zero-initialized by the low priority zero page thread. Should be quick, it doesn't take much effort.
  • a page needs to be stolen from another process and the content of that page does not need to be preserved. Happens for pages that previously contained code for example. Pretty quick as well.
  • a page needs to be stolen and its content needs to be preserved in the paging file. Happens for pages that previously contained data for example. A hard page fault, that one hurts. The processor will be stalled while the disk write takes place.
  • specific to your scenario, the HyperV manager decides that it time to borrow more RAM from the host operating system. All of the previous bullets apply to that OS, plus the overhead of the OS interaction. No real idea how much overhead that entails, ought to be painful as well.

Which of those bullets you are going to hit is very, very unpredictable. Most of all because it isn't just your program that is involved, whatever else runs on the machine affects it as well. And there's a memory-effect, something like writing a big file just before you start the test will have a drastic side-effect, caused by RAM pages being used by the file system cache that are waiting for the disk. Or another process having an allocation burst and draining the zero page queue. Or the memory bus getting saturated, pretty easy to do, could be affected by the host OS as well. Etcetera.

The long and short of it is that profiling this code just is not very meaningful. Anything can and will happen and you don't have a decent way to predict it will. Or a good way to do anything about it, other than giving the VM gobs of RAM and not running anything else on it :) Profiling results for the the second pass through the array is going to be a lot more stable and meaningful, the OS now is no longer involved.

like image 92
Hans Passant Avatar answered Sep 23 '22 13:09

Hans Passant