Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

malloc in windows 10 is slower than windows 7

Tags:

I'm migrating my application from windows 7 to windows 10.
All functions were worked without any changes, but execution time was slower than windows 7.
It seems object construction/destruction was slow. Then I created simple benchmark program regarding malloc() and free() such as below.

for (int i = 0; i < 100; i++)
{
  QueryPerformanceCounter(&gStart);
  p = malloc(size);
  free(p);
  QueryPerformanceCounter(&gEnd);
  printf("%d, %g\n", i, gEnd.QuadPart-gStart.QuadPart);
  if (p == NULL)
    printf("ERROR\n", size);
}

I ran this program in both windows 7 and windows 10 on same PC. I measured malloc() and free() performance when data size is 1, 100, 1000, 10000, 100000, 1000000, 10000000 and 100000000 bytes.
In all above cases, windows 10 is slower than windows 7.
Especially, windows 10 is slow more than tenfold windows 7 when data size is 10000000 and 100000000.

When data size is 10000000 bytes

  • Windows 7 : 0.391392 msec
  • Windows 10 : 4.254411 msec

When data size is 100000000 bytes

  • Windows 7 : 0.602178 msec
  • Windows 10 : 38.713946 msec

Do you have any suggestions to improve it on windows 10?

I've experimented with the followings in windows 10, but performance was not improved unfortunately.

  • Disabled superfetch
  • Disabled Ndu.sys
  • Disk cleanup

Here is the source code. (updated Feb 15th)

#include "stdafx.h"

#define START_TIME  QueryPerformanceCounter(&gStart);
#define END_TIME    QueryPerformanceCounter(&gEnd);

#define PRT_FMT(fmt, ...)   printf(fmt, __VA_ARGS__); 
#define PRT_TITLE(fmt, ...) printf(fmt, __VA_ARGS__); gTotal.QuadPart = 0;
#define PRT_RESULT  printf(",%d", gEnd.QuadPart-gStart.QuadPart); gTotal.QuadPart+=(gEnd.QuadPart-gStart.QuadPart);
#define PRT_END printf("\n");
//#define PRT_END       printf(",total,%d,%d\n", gTotal.QuadPart, gTotal.QuadPart*1000000/gFreq.QuadPart);


LARGE_INTEGER gStart;
LARGE_INTEGER gEnd;
LARGE_INTEGER gTotal;
LARGE_INTEGER gFreq;

void
t_Empty()
{
    PRT_TITLE("02_Empty");
    START_TIME
    END_TIME; PRT_RESULT
    PRT_END
}
void
t_Sleep1234()
{
    PRT_TITLE("01_Sleep1234");
    START_TIME
        Sleep(1234);
    END_TIME; PRT_RESULT
    PRT_END
}

void*
t_Malloc_Free(size_t size)
{
    void* pVoid;

    PRT_TITLE("Malloc_Free_%d", size);
    for(int i=0; i<100; i++)
    {
        START_TIME
        pVoid = malloc(size);
        free(pVoid);
        END_TIME; PRT_RESULT
        if(pVoid == NULL)
        {
            PRT_FMT("ERROR size(%d)", size);
        }

    }
    PRT_END

    return pVoid;
}

int _tmain(int argc, _TCHAR* argv[])
{
    int i;
    QueryPerformanceFrequency(&gFreq);
    PRT_FMT("00_QueryPerformanceFrequency, %lld\n", gFreq.QuadPart);

    t_Empty();
    t_Sleep1234();

    for(i=0; i<10; i++)
    {
        t_Malloc_Free(1);
        t_Malloc_Free(100);
        t_Malloc_Free(1000);        //1KB
        t_Malloc_Free(10000);
        t_Malloc_Free(100000);
        t_Malloc_Free(1000000);     //1MB
        t_Malloc_Free(10000000);    //10MB
        t_Malloc_Free(100000000);   //100MB
    }
    return 0;
}

Result in my environment (built by VS2010 and windows 7) In 100MB case :

  • QPC count in windows 7 : 11.52 (4.03usec)

  • QPC count in windows 10 : 973.28 (341msec)

like image 920
pleiades92 Avatar asked Oct 06 '16 12:10

pleiades92


People also ask

Is Windows 10 slower than Windows 7?

Inevitably yes, although many aspects of Windows 10 are improved over Windows 7. But the additional baggage and features, do mean you will see it slower on the same hardware. Your best option will be to add more RAM if possible. Windows 10 seems to run pretty good on 8GB of ram.

Is Windows 7 faster than Windows 10?

Although Windows 7 still outperforms Windows 10 across a selection of apps, expect this to be short-lived as Windows 10 continues to receive updates. In the meantime, Windows 10 boots, sleeps, and wakes faster than its predecessors, even when loaded on an older machine.


1 Answers

One thing that may have some impact is that the internals of the QueryPerformanceCounter API have apparently changed from Windows 7 to Windows 8. https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspx

Windows 8, Windows 8.1, Windows Server 2012, and Windows Server 2012 R2 use TSCs as the basis for the performance counter. The TSC synchronization algorithm was significantly improved to better accommodate large systems with many processors.


More importantly, your benchmarking code in itself is broken. QuadPart is of type LONGLONG, as is the expression gEnd.QuadPart-gStart.QuadPart. But you print this expression with the %g format specifier which expects a double. So you invoke undefined behavior and the output you have been reading is complete nonsense.

Similarly, printf("ERROR\n", size); is another bug.


That being said, operative systems often don't do the actual heap allocation before that memory area is actually used. Meaning that there is probably no actual allocation taking place in your program.

To counter this behavior during benchmarking, you have to actually use the memory. For example, you could add something like this to ensure that the allocation is actually taking place:

p = malloc(size);
volatile int x = i;
p[0] = x;
free(p);
like image 88
Lundin Avatar answered Oct 14 '22 02:10

Lundin