On the use and abuse of alloca

Tags:

I am working on a soft-realtime event processing system. I would like to minimise as many calls in my code that have non-deterministic timing. I need to construct a message that consists of strings, numbers, timestamps and GUID's. Probably a std::vector of boost::variant's.

I have always wanted to use alloca in past code of a similar nature. However, when one looks into systems programming literature there are always massive cautions against this function call. Personally I can't think of a server class machine in the last 15 years that doesn't have virtual memory, and I know for a fact that the windows stack grows a virtual-memory page-at-a-time, so I assume Unices do as well. There is no brick wall here (anymore), the stack is just as likely to run out of space as the heap, so what gives ? Why aren't people going gaga over aloca ? I can think of many use-cases of responsible use of alloca (string processing anyone ?).

Anyhow, I decided to test the performance difference (see below) and there is a 5-fold speed difference between alloca and malloc (the test captures how I would use alloca). So, have things changed? Should we just throw caution to the wind and use alloca (wrapped in a std::allocator) whenever we can be absolutely certain of the lifetime of our objects ?

I am tired of living in fear !

Edit:

Ok so there are limits, for windows it is a link-time limit. For Unix it seems to be tunable. It seems a page-aligned memory allocator is in order :D Anyone know of a general purpose portable implementation :D ?

Code:

#include <stdlib.h> #include <time.h>  #include <boost/date_time/posix_time/posix_time.hpp> #include <iostream>  using namespace boost::posix_time;  int random_string_size() {     return ( (rand() % 1023) +1 ); }  int random_vector_size() {     return ( (rand() % 31) +1); }  void alloca_test() {     int vec_sz = random_vector_size();      void ** vec = (void **) alloca(vec_sz * sizeof(void *));          for(int i = 0 ; i < vec_sz ; i++)     {         vec[i] = alloca(random_string_size());          } }  void malloc_test() {     int vec_sz = random_vector_size();      void ** vec = (void **) malloc(vec_sz * sizeof(void *));          for(int i = 0 ; i < vec_sz ; i++)     {         vec[i] = malloc(random_string_size());          }      for(int i = 0 ; i < vec_sz ; i++)     {         free(vec[i]);      }      free(vec); }  int main() {     srand( time(NULL) );     ptime now;     ptime after;       int test_repeat = 100;      int times = 100000;       time_duration alloc_total;     for(int ii=0; ii < test_repeat; ++ii)     {           now = microsec_clock::local_time();         for(int i =0 ; i < times ; ++i)         {             alloca_test();             }         after = microsec_clock::local_time();          alloc_total += after -now;     }      std::cout << "alloca_time: " << alloc_total/test_repeat << std::endl;      time_duration malloc_total;     for(int ii=0; ii < test_repeat; ++ii)     {         now = microsec_clock::local_time();         for(int i =0 ; i < times ; ++i)         {             malloc_test();         }         after = microsec_clock::local_time();         malloc_total += after-now;     }      std::cout << "malloc_time: " << malloc_total/test_repeat << std::endl; }

output:

hassan@hassan-desktop:~/test$ ./a.out  alloca_time: 00:00:00.056302 malloc_time: 00:00:00.260059 hassan@hassan-desktop:~/test$ ./a.out  alloca_time: 00:00:00.056229 malloc_time: 00:00:00.256374 hassan@hassan-desktop:~/test$ ./a.out  alloca_time: 00:00:00.056119 malloc_time: 00:00:00.265731

--Edit: Results on home machine, clang, and google perftools--

G++ without any optimization flags alloca_time: 00:00:00.025785 malloc_time: 00:00:00.106345   G++ -O3 alloca_time: 00:00:00.021838 cmalloc_time: 00:00:00.111039   Clang no flags alloca_time: 00:00:00.025503 malloc_time: 00:00:00.104551  Clang -O3 (alloca become magically faster) alloca_time: 00:00:00.013028 malloc_time: 00:00:00.101729  g++ -O3 perftools alloca_time: 00:00:00.021137 malloc_time: 00:00:00.043913  clang++ -O3 perftools (The sweet spot) alloca_time: 00:00:00.013969 malloc_time: 00:00:00.044468

916

asked Apr 27 '11 16:04

Hassan Syed

2 Answers

Well first of all, even though there is a lot of virtual memory doesn't mean your process will be allowed to fill it. On *nix there are stack size limits, whereas the heap is a lot more forgiving.

If you're only going to be allocating a few hundred / thousand bytes, sure go ahead. Anything beyond that is going to depend on what limits (ulimit) are in place on any given system, and that's just a recipe for disaster.

Why is the use of alloca() not considered good practice?

On my development box at work (Gentoo) I have a default stack size limit of 8192 kb. That's not very big, and if alloca overflows the stack then the behavior is undefined.

178

answered Oct 15 '22 17:10

Chris Eberle

I think you need to be a little bit careful in understanding what alloca actually is. Unlike malloc which goes to the heap, searches through buckets and linked lists of various buffers, alloca simply takes your stack register (ESP on x86) and moves it to create a "hole" on your thread's stack where you can store whatever you want. That's why it's uber-fast, just one (or few) assembly instruction.

So as others pointed out, it's not the "virtual memory" that you need to worry about but the size reserved for the stack. Although others limit themselves to "few hundred bytes", as long as you know your application and careful about it, we've allocated up to 256kb without any problems (default stack size, at least for visual studio, is 1mb and you can always increase it if you need to).

Also you really can't use alloca as a general purpose allocator (i.e. wrapping it inside another function) because whatever memory alloca allocates for you, that memory will be gone when the stack frame for current function is popped (i.e. when function exits).

I've also seen some people say that alloca is not completely cross-platform compatible, but if you are writing a specific application for a specific platform and you have the option of using alloca, sometimes it's the best option you have, as long as you understand the implications of increasing stack usage.

answered Oct 15 '22 17:10

DXM

Related questions
                            
                                jQuery Mobile - all pages in index.html vs. single external pages - What gives better performance?
                            
                                Jersey can produce List<T> but cannot Response.ok(List<T>).build()?
                            
                                How to resize and save plots in png format?
                            
                                Populate Listview from JSON
                            
                                Calling functions on columns of data in Gnuplot
                            
                                MySQL disable all triggers
                            
                                Race condition on x86
                            
                                PyMongo -- cursor iteration
                            
                                How does Excel successfully round floating point numbers even though they are imprecise?
                            
                                How to select a single overload of a function with using namespace::function in C++?
                            
                                Makefile as an executable script with shebang?
                            
                                How do I setInterval with CoffeeScript?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With