I am working on a soft-realtime event processing system. I would like to minimise as many calls in my code that have non-deterministic timing. I need to construct a message that consists of strings, numbers, timestamps and GUID's. Probably a std::vector
of boost::variant
's.
I have always wanted to use alloca
in past code of a similar nature. However, when one looks into systems programming literature there are always massive cautions against this function call. Personally I can't think of a server class machine in the last 15 years that doesn't have virtual memory, and I know for a fact that the windows stack grows a virtual-memory page-at-a-time, so I assume Unices do as well. There is no brick wall here (anymore), the stack is just as likely to run out of space as the heap, so what gives ? Why aren't people going gaga over aloca ? I can think of many use-cases of responsible use of alloca (string processing anyone ?).
Anyhow, I decided to test the performance difference (see below) and there is a 5-fold speed difference between alloca and malloc (the test captures how I would use alloca). So, have things changed? Should we just throw caution to the wind and use alloca
(wrapped in a std::allocator
) whenever we can be absolutely certain of the lifetime of our objects ?
I am tired of living in fear !
Edit:
Ok so there are limits, for windows it is a link-time limit. For Unix it seems to be tunable. It seems a page-aligned memory allocator is in order :D Anyone know of a general purpose portable implementation :D ?
Code:
#include <stdlib.h> #include <time.h> #include <boost/date_time/posix_time/posix_time.hpp> #include <iostream> using namespace boost::posix_time; int random_string_size() { return ( (rand() % 1023) +1 ); } int random_vector_size() { return ( (rand() % 31) +1); } void alloca_test() { int vec_sz = random_vector_size(); void ** vec = (void **) alloca(vec_sz * sizeof(void *)); for(int i = 0 ; i < vec_sz ; i++) { vec[i] = alloca(random_string_size()); } } void malloc_test() { int vec_sz = random_vector_size(); void ** vec = (void **) malloc(vec_sz * sizeof(void *)); for(int i = 0 ; i < vec_sz ; i++) { vec[i] = malloc(random_string_size()); } for(int i = 0 ; i < vec_sz ; i++) { free(vec[i]); } free(vec); } int main() { srand( time(NULL) ); ptime now; ptime after; int test_repeat = 100; int times = 100000; time_duration alloc_total; for(int ii=0; ii < test_repeat; ++ii) { now = microsec_clock::local_time(); for(int i =0 ; i < times ; ++i) { alloca_test(); } after = microsec_clock::local_time(); alloc_total += after -now; } std::cout << "alloca_time: " << alloc_total/test_repeat << std::endl; time_duration malloc_total; for(int ii=0; ii < test_repeat; ++ii) { now = microsec_clock::local_time(); for(int i =0 ; i < times ; ++i) { malloc_test(); } after = microsec_clock::local_time(); malloc_total += after-now; } std::cout << "malloc_time: " << malloc_total/test_repeat << std::endl; }
output:
hassan@hassan-desktop:~/test$ ./a.out alloca_time: 00:00:00.056302 malloc_time: 00:00:00.260059 hassan@hassan-desktop:~/test$ ./a.out alloca_time: 00:00:00.056229 malloc_time: 00:00:00.256374 hassan@hassan-desktop:~/test$ ./a.out alloca_time: 00:00:00.056119 malloc_time: 00:00:00.265731
--Edit: Results on home machine, clang, and google perftools--
G++ without any optimization flags alloca_time: 00:00:00.025785 malloc_time: 00:00:00.106345 G++ -O3 alloca_time: 00:00:00.021838 cmalloc_time: 00:00:00.111039 Clang no flags alloca_time: 00:00:00.025503 malloc_time: 00:00:00.104551 Clang -O3 (alloca become magically faster) alloca_time: 00:00:00.013028 malloc_time: 00:00:00.101729 g++ -O3 perftools alloca_time: 00:00:00.021137 malloc_time: 00:00:00.043913 clang++ -O3 perftools (The sweet spot) alloca_time: 00:00:00.013969 malloc_time: 00:00:00.044468
The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.
alloca() is very useful if you can't use a standard local variable because its size would need to be determined at runtime and you can absolutely guarantee that the pointer you get from alloca() will NEVER be used after this function returns.
alloca() Defined in alloca. h , this function allocates local storage in a function. It returns a pointer to size bytes of memory. The default implementation returns an eight-byte aligned block of memory on the stack.
Using alloca wastes very little space and is very fast. (It is open-coded by the GNU C compiler.) Since alloca does not have separate pools for different sizes of blocks, space used for any size block can be reused for any other size. alloca does not cause memory fragmentation.
Well first of all, even though there is a lot of virtual memory doesn't mean your process will be allowed to fill it. On *nix there are stack size limits, whereas the heap is a lot more forgiving.
If you're only going to be allocating a few hundred / thousand bytes, sure go ahead. Anything beyond that is going to depend on what limits (ulimit) are in place on any given system, and that's just a recipe for disaster.
Why is the use of alloca() not considered good practice?
On my development box at work (Gentoo) I have a default stack size limit of 8192 kb. That's not very big, and if alloca overflows the stack then the behavior is undefined.
I think you need to be a little bit careful in understanding what alloca actually is. Unlike malloc which goes to the heap, searches through buckets and linked lists of various buffers, alloca simply takes your stack register (ESP on x86) and moves it to create a "hole" on your thread's stack where you can store whatever you want. That's why it's uber-fast, just one (or few) assembly instruction.
So as others pointed out, it's not the "virtual memory" that you need to worry about but the size reserved for the stack. Although others limit themselves to "few hundred bytes", as long as you know your application and careful about it, we've allocated up to 256kb without any problems (default stack size, at least for visual studio, is 1mb and you can always increase it if you need to).
Also you really can't use alloca as a general purpose allocator (i.e. wrapping it inside another function) because whatever memory alloca allocates for you, that memory will be gone when the stack frame for current function is popped (i.e. when function exits).
I've also seen some people say that alloca is not completely cross-platform compatible, but if you are writing a specific application for a specific platform and you have the option of using alloca, sometimes it's the best option you have, as long as you understand the implications of increasing stack usage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With