Why are my memory benchmarks giving strange results?

Tags:

I have recently been running some basic benchmarks written in C# to try to determine the reason why some seemingly identical HyperV remote workstations seem to be running far slower than others. Their results on most of the basic tests that I am running have been totally identical, but the results from a basic memory access benchmark (specifically, time taken to initialise a 2 dimensional 1000x1000 array of doubles to 0) differ by a factor of 40.

To further investigate this issue, I have run several other experiments to further narrow down the issue. Running the same test with an exponentially increasing array size (until an OutOfMemoryException occurs) shows no difference between the various remotes until the array size is over 1m, then an immediate difference of a factor of around 40. In fact, testing incremental array sizes, time taken to initialise increases proportionally to array size until an array size of exactly 999999, then on the'slow' remotes, the time taken increases by 900%, while on the 'fast' remotes it decreases by 70% as the array size reaches 1000x1000. From there, it continues to scale proportionally. The same phenomenon also happens with array sizes of 1m x 1 and 1 x 1m, though to a much smaller extent (changes of +50% and -30% instead.

Interestingly, changing the data type used for the experiment to floats appears to completely eliminate this phenomenon. No difference occurs between the remotes in any test, and the time taken appears to be entirely proportional even over the 1000*1000 and 2000*2000 breakpoints. Another interesting factor is that the local workstation that I am using's behaviour appears to mirror that of the slower remotes.

Does anybody have any idea what settings in the system configuration might be causing this effect and how it might be changed, or what might be done to further debug the issue?

530

asked Mar 13 '16 18:03

Deso Lution

1 Answers

You'll have to keep in mind what you are really testing. Which is most certainly not a .NET program's ability to assign array elements. That is very fast and normally proceeds are memory bus bandrates for a big array, typically ~37 gigabytes/second depending on the kind of RAM the machine has, 5 GB/sec on the pokiest kind you could run into today (slow clocked DDR2 on an old machine).

The new keyword only allocates address space on a demand-paged virtual memory operating system like Windows. Just numbers to the processor, one each for every 4096 bytes.

Once you start assigning elements the first time, the demand-paged feature kicks in and your code forces the operating system to allocate RAM for the array. The array element assignment triggers a page fault, one for each 4096 bytes in the array. Or 512 doubles for your array. The cost of handling the page fault is included in your measurement.

That's smooth sailing only when the OS has a zero-initialized RAM page ready to be used. Usually takes a fat half a microsecond, give or take. Still a lot of a time to a processor, it will be stalled when the OS updates the page mapping. Keep in mind that this only happens on the first element access, subsequent ones are fast since the RAM page will still be available. Usually.

It is not smooth sailing when such a RAM page is not available. Then the OS has to pillage one. There are as many as 4 distinct scenarios in your case that I can think of:

a page is available but not yet zero-initialized by the low priority zero page thread. Should be quick, it doesn't take much effort.
a page needs to be stolen from another process and the content of that page does not need to be preserved. Happens for pages that previously contained code for example. Pretty quick as well.
a page needs to be stolen and its content needs to be preserved in the paging file. Happens for pages that previously contained data for example. A hard page fault, that one hurts. The processor will be stalled while the disk write takes place.
specific to your scenario, the HyperV manager decides that it time to borrow more RAM from the host operating system. All of the previous bullets apply to that OS, plus the overhead of the OS interaction. No real idea how much overhead that entails, ought to be painful as well.

Which of those bullets you are going to hit is very, very unpredictable. Most of all because it isn't just your program that is involved, whatever else runs on the machine affects it as well. And there's a memory-effect, something like writing a big file just before you start the test will have a drastic side-effect, caused by RAM pages being used by the file system cache that are waiting for the disk. Or another process having an allocation burst and draining the zero page queue. Or the memory bus getting saturated, pretty easy to do, could be affected by the host OS as well. Etcetera.

The long and short of it is that profiling this code just is not very meaningful. Anything can and will happen and you don't have a decent way to predict it will. Or a good way to do anything about it, other than giving the VM gobs of RAM and not running anything else on it :) Profiling results for the the second pass through the array is going to be a lot more stable and meaningful, the OS now is no longer involved.

answered Sep 23 '22 13:09

Hans Passant

Related questions
                            
                                C# HttpClient.SendAsync always returns 404 but URL works in browser
                            
                                Can you use a class library if you don't reference all of it's dependencies?
                            
                                Get PerformanceCounter by Index
                            
                                What does "final" mean in IL?
                            
                                Import libraries error with Mono on Ubuntu 15.04
                            
                                How to programmatically disable edge swipe gesture of the windows 10 tablet screen?
                            
                                Decoding Base64urlUInt-encoded value
                            
                                An item with the same key has already been added when creating new WPF project
                            
                                Mocking delegates with Moq
                            
                                Trade offs and best practices building microservices with Azure Service Fabric
                            
                                c# uint to ushort overflow like in native C
                            
                                Convert List<T> into another List<T> that contains another List<T>
                            
                                Entity Framework soft delete implementation using database interceptor not working
                            
                                ConfigureAwait: On which thread is the exception handled?
                            
                                Will the C# compiler optimize calls to a same method inside a loop?
                            
                                Reentrancy in async/await?
                            
                                How to make a grid inside a button have 100 percent width in WPF?
                            
                                PDFsharp, error displaying a JPG in PDF
                            
                                How to stop a task of FluentScheduler?
                            
                                Should I check if argument is null if I'm going to use it immediately?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are my memory benchmarks giving strange results?

Tags:

arrays

c#

benchmarking

Deso Lution

People also ask

1 Answers

Hans Passant

Recent Activity

Donate For Us