Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can i allocate memory faster by using multiple threads?

If i make a loop that reserves 1kb integer arrays, int[1024], and i want it to allocate 10000 arrays, can i make it faster by running the memory allocations from multiple threads?

I want them to be in the heap.

Let's assume that i have a multi-core processor for the job.

I already did try this, but it decreased the performance. I'm just wondering, did I just make bad code or is there something that i didn't know about memory allocation?

Does the answer depend on the OS? please tell me how it works on different platforms if so.

Edit:

The integer array allocation loop was just a simplified example. Don't bother telling me how I can improve that.

like image 965
0xbaadf00d Avatar asked May 09 '11 06:05

0xbaadf00d


People also ask

Does using more threads use more memory?

So, threads are going to use available memory - whatever kind of it is available. How many threads you can start depends on the memory size and how much memory is needed per thread. If thread uses heap (not only stack), then it needs more memory, and in that case you can start less threads. Save this answer.

Does multithreading speed up?

In order to increase the speed of the processor core without changing the frequency, you can use multithreading to have the CPU process several tasks at once. Or to be precise, you can have it process several threads at once. A thread is a sequence of programmed instructions that's part of a larger process.

Do threads allocate memory?

Thread-local storage is a memory location, which is allocated and freed by a single thread. Therefore, there is no need to synchronize allocation and deallocation of thread-local storage. Specifically, we enhance the memory management functions with two new functions, tls malloc and tls free.


1 Answers

It depends on many things, but primarily:

  • the OS
  • the implementation of malloc you are using

The OS is responsible for allocating the "virtual memory" that your process has access to and builds a translation table that maps the virtual memory back to actual memory addresses.

Now, the default implementation of malloc is generally conservative, and will simply have a giant lock around all this. This means that requests are processed serially, and the only thing that allocating from multiple threads instead of one does is slowing down the whole thing.

There are more clever allocation schemes, generally based upon pools, and they can be found in other malloc implementations: tcmalloc (from Google) and jemalloc (used by Facebook) are two such implementations designed for high-performance in multi-threaded applications.

There is no silver bullet though, and at one point the OS must perform the virtual <=> real translation which requires some form of locking.

Your best bet is to allocate by arenas:

  • Allocate big chunks (arenas) at once
  • Split them up in arrays of the appropriate size

There is no need to parallelize the arena allocation, and you'll be better off asking for the biggest arenas you can (do bear in mind that allocation requests for a too large amount may fail), then you can parallelize the split.

tcmalloc and jemalloc may help a bit, however they are not designed for big allocations (which is unusual) and I do not know if it is possible to configure the size of the arenas they request.

like image 112
Matthieu M. Avatar answered Sep 30 '22 01:09

Matthieu M.