I maintain legacy C code where at many places they have small arrays like int a[32];
followed by a memset(a, 0, sizeof a);
to zero initialize it.
I'm thinking of refactoring this into int a[32] = {0};
and removing the memset.
The question is: Are using zero initializers result in faster code in general than calling memset?
Roughly speaking, the memset function is 15 times faster than std::fill in my test.
Most certainly, memset will be much faster than that loop. Note how you treat one character at a time, but those functions are so optimized that set several bytes at a time, even using, when available, MMX and SSE instructions.
All zeroing operations that the pool allocator performs and many structure/array initializations that InitAll performs end up going through the memset function. Memset is one of the hottest functions on the operating system and is already quite optimized as a result.
I have used calloc(), instead of combination of malloc and memset as a work around. calloc is the functional equivalent of malloc + memset.
memset()
.It depends on your compiler. It shouldn't be any slower than calling memset()
(because calling memset()
is one option available to the compiler).
The initializer is easier to read than imperatively overwriting the array; it also adapts well if the element type is changed to something where all-bit-zero isn't what you want.
As an experiment, let's see what GCC does with this:
#include <string.h>
int f1()
{
int a[32] = {0};
return a[31];
}
int f2()
{
int a[32];
memset(a, 0, sizeof a);
return a[31];
}
Compiling with gcc -S -std=c11
gives:
f1:
.LFB0:
.file 1 "40786375.c"
.loc 1 4 0
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $8, %rsp
.loc 1 5 0
leaq -128(%rbp), %rdx
movl $0, %eax
movl $16, %ecx
movq %rdx, %rdi
rep stosq
.loc 1 6 0
movl -4(%rbp), %eax
.loc 1 7 0
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
f2:
.LFB1:
.loc 1 10 0
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
addq $-128, %rsp
.loc 1 12 0
leaq -128(%rbp), %rax
movl $128, %edx
movl $0, %esi
movq %rax, %rdi
call memset@PLT
.loc 1 13 0
movl -4(%rbp), %eax
.loc 1 14 0
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
showing that f1()
uses rep stosq
for the initializer, whereas f2()
has the function call, exactly like the C code. It's quite likely that memset()
has a more efficient vectorized implementation for large arrays, but for small arrays like this, any benefits would likely be outweighed by the function call overhead.
If we declare a
as volatile
, we get to see what happens with optimizations enabled (gcc -S -std=c11 -O3
):
f1:
.LFB4:
.cfi_startproc
subq $16, %rsp
.cfi_def_cfa_offset 24
xorl %eax, %eax
movl $16, %ecx
leaq -120(%rsp), %rdi
rep stosq
movl 4(%rsp), %eax
addq $16, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
f2:
.LFB5:
.cfi_startproc
subq $16, %rsp
.cfi_def_cfa_offset 24
xorl %eax, %eax
movl $16, %ecx
leaq -120(%rsp), %rdx
movq %rdx, %rdi
rep stosq
movl 4(%rsp), %eax
addq $16, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
You can see that the two functions now compile to identical code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With