i am using libuv for my extensively-network-interacting application and i am concerned about which techniques of re-using allocated memory would be at the same time efficent and safe with libuv callback deferrence of execution.
At very basic layer, exposed to libuv user, one is getting need to specify buffer allocation callback alongside with setting up a handle reader:
UV_EXTERN int uv_read_start(uv_stream_t*, uv_alloc_cb alloc_cb, uv_read_cb read_cb);
where uv_alloc_cb
is
typedef void (*uv_alloc_cb)(uv_handle_t* handle, size_t suggested_size, uv_buf_t* buf);
But here is the problem: this memory-allocating callback is invoked each time new message is comming via handle (say, each UDP datagram from uv_udp_t
handle is received) and stright-forward allocation of new buffer for each incoming UDP datagram seems very non-memory-wise.
So i am asking for a common C techniques (probably, within deferred execution context introduced by libuv callback system) of having the same allocated memory be re-used when possible.
Also, i would like to stay windows-portable, if possible.
Notes:
Reading http://nikhilm.github.io/uvbook/filesystem.html , i have noticed the following phrase under the snip uvtee/main.c - Write to pipe
:
We make a copy so we can free the two buffers from the two calls to write_data independently of each other. While acceptable for a demo program like this, you’ll probably want smarter memory management, like reference counted buffers or a pool of buffers in any major application.
but i wasn't able to find any solutions involving reference counting on libuv buffers (how this could be properly performed?) or explicit examples of pools of buffers in libuv environment (is there any libraries for that?).
I would like to share my own experience in solving this problem. I can feel your pain and confusion, but in reality, it's not overly hard to implement a working solution considering the vast array of options you have if you know what you are doing.
Implement a pool of buffers capable of performing two operations - acquire and release.
The basic pooling strategy:
The pool may be of fixed or variable size. "Variable" means that initially there are M pre-allocated buffers (zero for instance), and the pool can grow on demand up to N. "Fixed" means that all buffers are pre-allocated upon the pool's creation (M = N).
Implement a callback which acquires buffers for libuv.
Do not allow infinite pool growth still having the pool functional in any circumstances except for out-of-memory situations.
Now, let's shed some more light on all this in details.
The pool structure:
#define BUFPOOL_CAPACITY 100
typedef struct bufpool_s bufpool_t;
struct bufpool_s {
void *bufs[BUFPOOL_CAPACITY];
int size;
};
size
is the current pool size.
A buffer itself is a memory block prefixed with the following structure:
#define bufbase(ptr) ((bufbase_t *)((char *)(ptr) - sizeof(bufbase_t)))
#define buflen(ptr) (bufbase(ptr)->len)
typedef struct bufbase_s bufbase_t;
struct bufbase_s {
bufpool_t *pool;
int len;
};
len
is the length of the buffer in bytes.
Allocation of a new buffer looks like this:
void *bufpool_alloc(bufpool_t *pool, int len) {
bufbase_t *base = malloc(sizeof(bufbase_t) + len);
if (!base) return 0;
base->pool = pool;
base->len = len;
return (char *)base + sizeof(bufbase_t);
}
Note that the returned pointer points to the next byte after the header - the data area. This allows to have buffer pointers as though they were allocated via the standard call to malloc
.
Deallocation is the opposite:
void bufpool_free(void *ptr) {
if (!ptr) return;
free(bufbase(ptr));
}
The allocation callback for libuv looks like this:
void alloc_cb(uv_handle_t *handle, size_t size, uv_buf_t *buf) {
int len;
void *ptr = bufpool_acquire(handle->loop->data, &len);
*buf = uv_buf_init(ptr, len);
}
You can see here that alloc_cb
takes a buffer pool's pointer from the user data pointer on the loop. That means that a buffer pool should be attached to an event loop prior to usage. In other words, you should initialize a pool upon creation of the loop and assign its pointer to the data
field. If you already hold other user data in that field, just extend your structure.
A dummy buffer is a fake buffer which means it doesn't originate in the pool but is still fully functional. The purpose of dummy buffers is to keep the whole thing working in rare situations of pool starvation, i.e. when all buffers are acquired and there is a need for another one. Based on my research, allocation of small blocks of memory about 8Kb is done very quickly on all modern OSes - that suits the size of a dummy buffer well.
#define DUMMY_BUF_SIZE 8000
void *bufpool_dummy() {
return bufpool_alloc(0, DUMMY_BUF_SIZE);
}
The acquire operation:
void *bufpool_acquire(bufpool_t *pool, int *len) {
void *buf = bufpool_dequeue(pool);
if (!buf) buf = bufpool_dummy();
*len = buf ? buflen(buf) : 0;
return buf;
}
The release operation:
void bufpool_release(void *ptr) {
bufbase_t *base;
if (!ptr) return;
base = bufbase(ptr);
if (base->pool) bufpool_enqueue(base->pool, ptr);
else free(base);
}
There are two functions around here - bufpool_enqueue
and bufpool_dequeue
. Basically, they perform all the pool's work.
In my case, there is a O(1) queue of buffer indexes on top of the said above which allows me to keep track of the pool's state more efficiently fetching indexes of buffers very quickly. It's not necessary to go extreme like I did because the maximum size of the pool is limited, hence any array search will be constant in time as well.
In the simplest case, you can implement these functions as pure linear searchers throughout the bufs
array in the bufpool_s
structure. For example, if a buffer gets acquired, you search for the first non-NULL spot, save the pointer and put NULL in that spot. Next time when the buffer is released, you search for the first NULL spot and save its pointer in there.
The pool internals as follows:
#define BUF_SIZE 64000
void *bufpool_grow(bufpool_t *pool) {
int idx = pool->size;
void *buf;
if (idx == BUFPOOL_CAPACITY) return 0;
buf = bufpool_alloc(pool, BUF_SIZE);
if (!buf) return 0;
pool->bufs[idx] = 0;
pool->size = idx + 1;
return buf;
}
void bufpool_enqueue(bufpool_t *pool, void *ptr) {
int idx;
for (idx = 0; idx < pool->size; ++idx) {
if (!pool->bufs[idx]) break;
}
assert(idx < pool->size);
pool->bufs[idx] = ptr;
}
void *bufpool_dequeue(bufpool_t *pool) {
int idx;
void *ptr;
for (idx = 0; idx < pool->size; ++idx) {
ptr = pool->bufs[idx];
if (ptr) {
pool->bufs[idx] = 0;
return ptr;
}
}
return bufpool_grow(pool);
}
The normal buffer size is 64000 bytes because I want it to comfortably fit in a 64Kb block with its header.
And finally, initialization and de-initialization routines:
void bufpool_init(bufpool_t *pool) {
pool->size = 0;
}
void bufpool_done(bufpool_t *pool) {
int idx;
for (idx = 0; idx < pool->size; ++idx) bufpool_free(pool->bufs[idx]);
}
Please note that this implementation is simplified for illustrative purposes. There is no pool shrinking policy here whereas in a real world scenario, it will be most likely required.
You should be able to write your libuv callbacks now:
void read_cb(uv_stream_t *stream, ssize_t nread, const uv_buf_t *buf) {
/* ... */
bufpool_release(buf->base); /* Release the buffer */
}
Loop initialization:
uv_loop_t *loop = malloc(sizeof(*loop));
bufpool_t *pool = malloc(sizeof(*pool));
uv_loop_init(loop);
bufpool_init(pool);
loop->data = pool;
Operation:
uv_tcp_t *tcp = malloc(sizeof(*tcp));
uv_tcp_init(tcp);
/* ... */
uv_read_start((uv_handle_t *)tcp, alloc_cb, read_cb);
It's also a good idea to use an adaptive strategy when fetching a buffer depending on the requested size and return pooled buffers only when a big chunk of data is requested (e.g. all reads and long writes). For other cases (e.g. most writes), return dummy buffers. This will help to avoid wasting pooled buffers yet maintaining acceptable allocation speed. For example:
void alloc_cb(uv_handle_t *handle, size_t size, uv_buf_t *buf) {
int len = size; /* Requested buffer size */
void *ptr = bufpool_acquire(handle->loop->data, &len);
*buf = uv_buf_init(ptr, len);
}
void *bufpool_acquire(bufpool_t *pool, int *len) {
int size = *len;
if (size > DUMMY_BUF_SIZE) {
buf = bufpool_dequeue(pool);
if (buf) {
if (size > BUF_SIZE) *len = BUF_SIZE;
return buf;
}
size = DUMMY_BUF_SIZE;
}
buf = bufpool_alloc(0, size);
*len = buf ? size : 0;
return buf;
}
P.S. No need for buflen
and bufpool_dummy
with this snippet.
If you are on Linux you are in luck. Linux kernel usually uses what is known as SLAB Allocator by default. The advantage of this allocator is that it reduces actual memory allocations by maintaining pools of recyclable blocks. What it means for you is that as long as you always allocate buffers of the same size (ideally pow2 sizes of PAGE_SIZE) you are okay with using malloc()
on Linux.
If you are not on Linux (or FreeBSD or Solaris) or if you develop a cross platform application you may consider using glib and its Memory Slices
that are a cross-platform implementation of the SLAB allocator. It uses native implementations on platforms that support it so using it on Linux will bring no advantage (I ran some tests myself). I'm sure there are other libraries that can do the same or you can implement it yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With