Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed of memcpy() greatly influenced by different ways of malloc()

I wrote a program to test the speed of memcpy(). However, how memory are allocated greatly influences the speed.

CODE

#include<stdlib.h>
#include<stdio.h>
#include<sys/time.h>

void main(int argc, char *argv[]){
    unsigned char * pbuff_1;
    unsigned char * pbuff_2;
    unsigned long iters = 1000*1000;

    int type = atoi(argv[1]);
    int buff_size = atoi(argv[2])*1024;

    if(type == 1){ 
        pbuff_1 = (void *)malloc(2*buff_size);
        pbuff_2 = pbuff_1+buff_size;
    }else{
        pbuff_1 = (void *)malloc(buff_size);
        pbuff_2 = (void *)malloc(buff_size);
    }   

    for(int i = 0; i < iters; ++i){
        memcpy(pbuff_2, pbuff_1, buff_size);
    }   

    if(type == 1){ 
        free(pbuff_1);
    }else{
        free(pbuff_1);
        free(pbuff_2);
    }   
}

The OS is linux-2.6.35 and the compiler is GCC-4.4.5 with options "-std=c99 -O3".

Results on my computer(memcpy 4KB, iterate 1 million times):

time ./test.test 1 4

real    0m0.128s
user    0m0.120s
sys 0m0.000s

time ./test.test 0 4

real    0m0.422s
user    0m0.420s
sys 0m0.000s

This question is related with a previous question:

Why does the speed of memcpy() drop dramatically every 4KB?

UPDATE

The reason is related with GCC compiler, and I compiled and run this program with different versions of GCC:

GCC version--------4.1.3--------4.4.5--------4.6.3

Time Used(1)-----0m0.183s----0m0.128s----0m0.110s

Time Used(0)-----0m1.788s----0m0.422s----0m0.108s

It seems GCC is getting smarter.

like image 975
foool Avatar asked Jan 13 '14 10:01

foool


1 Answers

The specific addresses returned by malloc are selected by the implementation and not always optimal for the using code. You already know that the speed of moving memory around depends greatly on cache and page effects.

Here, the specific pointers malloced are not known. You could print them out using printf("%p", ptr). What is known however, is that using just one malloc for two blocks surely avoids page and cache waste between the two blocks. That may already be the reason for the speed difference.

like image 118
Peter G. Avatar answered Nov 15 '22 10:11

Peter G.