I'm migrating my application from windows 7 to windows 10.
All functions were worked without any changes, but execution time was slower than windows 7.
It seems object construction/destruction was slow. Then I created simple
benchmark program regarding malloc() and free() such as below.
for (int i = 0; i < 100; i++)
{
QueryPerformanceCounter(&gStart);
p = malloc(size);
free(p);
QueryPerformanceCounter(&gEnd);
printf("%d, %g\n", i, gEnd.QuadPart-gStart.QuadPart);
if (p == NULL)
printf("ERROR\n", size);
}
I ran this program in both windows 7 and windows 10 on same PC.
I measured malloc() and free() performance when data size is 1, 100, 1000, 10000, 100000, 1000000, 10000000 and 100000000 bytes.
In all above cases, windows 10 is slower than windows 7.
Especially, windows 10 is slow more than tenfold windows 7 when data size is 10000000 and 100000000.
When data size is 10000000 bytes
When data size is 100000000 bytes
Do you have any suggestions to improve it on windows 10?
I've experimented with the followings in windows 10, but performance was not improved unfortunately.
Here is the source code. (updated Feb 15th)
#include "stdafx.h"
#define START_TIME QueryPerformanceCounter(&gStart);
#define END_TIME QueryPerformanceCounter(&gEnd);
#define PRT_FMT(fmt, ...) printf(fmt, __VA_ARGS__);
#define PRT_TITLE(fmt, ...) printf(fmt, __VA_ARGS__); gTotal.QuadPart = 0;
#define PRT_RESULT printf(",%d", gEnd.QuadPart-gStart.QuadPart); gTotal.QuadPart+=(gEnd.QuadPart-gStart.QuadPart);
#define PRT_END printf("\n");
//#define PRT_END printf(",total,%d,%d\n", gTotal.QuadPart, gTotal.QuadPart*1000000/gFreq.QuadPart);
LARGE_INTEGER gStart;
LARGE_INTEGER gEnd;
LARGE_INTEGER gTotal;
LARGE_INTEGER gFreq;
void
t_Empty()
{
PRT_TITLE("02_Empty");
START_TIME
END_TIME; PRT_RESULT
PRT_END
}
void
t_Sleep1234()
{
PRT_TITLE("01_Sleep1234");
START_TIME
Sleep(1234);
END_TIME; PRT_RESULT
PRT_END
}
void*
t_Malloc_Free(size_t size)
{
void* pVoid;
PRT_TITLE("Malloc_Free_%d", size);
for(int i=0; i<100; i++)
{
START_TIME
pVoid = malloc(size);
free(pVoid);
END_TIME; PRT_RESULT
if(pVoid == NULL)
{
PRT_FMT("ERROR size(%d)", size);
}
}
PRT_END
return pVoid;
}
int _tmain(int argc, _TCHAR* argv[])
{
int i;
QueryPerformanceFrequency(&gFreq);
PRT_FMT("00_QueryPerformanceFrequency, %lld\n", gFreq.QuadPart);
t_Empty();
t_Sleep1234();
for(i=0; i<10; i++)
{
t_Malloc_Free(1);
t_Malloc_Free(100);
t_Malloc_Free(1000); //1KB
t_Malloc_Free(10000);
t_Malloc_Free(100000);
t_Malloc_Free(1000000); //1MB
t_Malloc_Free(10000000); //10MB
t_Malloc_Free(100000000); //100MB
}
return 0;
}
Result in my environment (built by VS2010 and windows 7) In 100MB case :
QPC count in windows 7 : 11.52 (4.03usec)
QPC count in windows 10 : 973.28 (341msec)
Inevitably yes, although many aspects of Windows 10 are improved over Windows 7. But the additional baggage and features, do mean you will see it slower on the same hardware. Your best option will be to add more RAM if possible. Windows 10 seems to run pretty good on 8GB of ram.
Although Windows 7 still outperforms Windows 10 across a selection of apps, expect this to be short-lived as Windows 10 continues to receive updates. In the meantime, Windows 10 boots, sleeps, and wakes faster than its predecessors, even when loaded on an older machine.
One thing that may have some impact is that the internals of the QueryPerformanceCounter
API have apparently changed from Windows 7 to Windows 8. https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspx
Windows 8, Windows 8.1, Windows Server 2012, and Windows Server 2012 R2 use TSCs as the basis for the performance counter. The TSC synchronization algorithm was significantly improved to better accommodate large systems with many processors.
More importantly, your benchmarking code in itself is broken. QuadPart
is of type LONGLONG
, as is the expression gEnd.QuadPart-gStart.QuadPart
. But you print this expression with the %g
format specifier which expects a double
. So you invoke undefined behavior and the output you have been reading is complete nonsense.
Similarly, printf("ERROR\n", size);
is another bug.
That being said, operative systems often don't do the actual heap allocation before that memory area is actually used. Meaning that there is probably no actual allocation taking place in your program.
To counter this behavior during benchmarking, you have to actually use the memory. For example, you could add something like this to ensure that the allocation is actually taking place:
p = malloc(size);
volatile int x = i;
p[0] = x;
free(p);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With