Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

libuv allocated memory buffers re-use techniques

i am using libuv for my extensively-network-interacting application and i am concerned about which techniques of re-using allocated memory would be at the same time efficent and safe with libuv callback deferrence of execution.

At very basic layer, exposed to libuv user, one is getting need to specify buffer allocation callback alongside with setting up a handle reader:

UV_EXTERN int uv_read_start(uv_stream_t*, uv_alloc_cb alloc_cb, uv_read_cb read_cb);

where uv_alloc_cb is

typedef void (*uv_alloc_cb)(uv_handle_t* handle, size_t suggested_size, uv_buf_t* buf);

But here is the problem: this memory-allocating callback is invoked each time new message is comming via handle (say, each UDP datagram from uv_udp_t handle is received) and stright-forward allocation of new buffer for each incoming UDP datagram seems very non-memory-wise.

So i am asking for a common C techniques (probably, within deferred execution context introduced by libuv callback system) of having the same allocated memory be re-used when possible.

Also, i would like to stay windows-portable, if possible.

Notes:

  • i am aware of this question: Does libuv provide any facilities to attach a buffer to a connection and re use it ; it's accepted answer does not answers how to actually do the memory allocation right with libuv besides stating the fact that statically-allocated buffer is no-go. Especially, it is not covering the safety (with deferred write callbacks on the same buffer, which can overlap with another read callback invocation across multiple iterations of libuv main loop) aspect of buffer attached to a handle (either via wrapper structure or handle->data context).
  • Reading http://nikhilm.github.io/uvbook/filesystem.html , i have noticed the following phrase under the snip uvtee/main.c - Write to pipe:

    We make a copy so we can free the two buffers from the two calls to write_data independently of each other. While acceptable for a demo program like this, you’ll probably want smarter memory management, like reference counted buffers or a pool of buffers in any major application.

    but i wasn't able to find any solutions involving reference counting on libuv buffers (how this could be properly performed?) or explicit examples of pools of buffers in libuv environment (is there any libraries for that?).

like image 286
Alexander Tumin Avatar asked Feb 14 '15 01:02

Alexander Tumin


2 Answers

I would like to share my own experience in solving this problem. I can feel your pain and confusion, but in reality, it's not overly hard to implement a working solution considering the vast array of options you have if you know what you are doing.

Objective

  1. Implement a pool of buffers capable of performing two operations - acquire and release.

  2. The basic pooling strategy:

    • acquire withdraws a buffer from the pool effectively reducing the available number of buffers by 1;
    • If there are no buffers available, two options arise:
      • grow the pool and return a newly created buffer; or
      • create and return a dummy buffer (explained below).
    • release returns a buffer back to the pool.
  3. The pool may be of fixed or variable size. "Variable" means that initially there are M pre-allocated buffers (zero for instance), and the pool can grow on demand up to N. "Fixed" means that all buffers are pre-allocated upon the pool's creation (M = N).

  4. Implement a callback which acquires buffers for libuv.

  5. Do not allow infinite pool growth still having the pool functional in any circumstances except for out-of-memory situations.

Implementation

Now, let's shed some more light on all this in details.

The pool structure:

#define BUFPOOL_CAPACITY 100

typedef struct bufpool_s bufpool_t;

struct bufpool_s {
    void *bufs[BUFPOOL_CAPACITY];
    int size;
};

size is the current pool size.

A buffer itself is a memory block prefixed with the following structure:

#define bufbase(ptr) ((bufbase_t *)((char *)(ptr) - sizeof(bufbase_t)))
#define buflen(ptr) (bufbase(ptr)->len)

typedef struct bufbase_s bufbase_t;

struct bufbase_s {
    bufpool_t *pool;
    int len;
};

len is the length of the buffer in bytes.

Allocation of a new buffer looks like this:

void *bufpool_alloc(bufpool_t *pool, int len) {
    bufbase_t *base = malloc(sizeof(bufbase_t) + len);
    if (!base) return 0;
    base->pool = pool;
    base->len = len;
    return (char *)base + sizeof(bufbase_t);
}

Note that the returned pointer points to the next byte after the header - the data area. This allows to have buffer pointers as though they were allocated via the standard call to malloc.

Deallocation is the opposite:

void bufpool_free(void *ptr) {
    if (!ptr) return;
    free(bufbase(ptr));
}

The allocation callback for libuv looks like this:

void alloc_cb(uv_handle_t *handle, size_t size, uv_buf_t *buf) {
    int len;
    void *ptr = bufpool_acquire(handle->loop->data, &len);
    *buf = uv_buf_init(ptr, len);
}

You can see here that alloc_cb takes a buffer pool's pointer from the user data pointer on the loop. That means that a buffer pool should be attached to an event loop prior to usage. In other words, you should initialize a pool upon creation of the loop and assign its pointer to the data field. If you already hold other user data in that field, just extend your structure.

A dummy buffer is a fake buffer which means it doesn't originate in the pool but is still fully functional. The purpose of dummy buffers is to keep the whole thing working in rare situations of pool starvation, i.e. when all buffers are acquired and there is a need for another one. Based on my research, allocation of small blocks of memory about 8Kb is done very quickly on all modern OSes - that suits the size of a dummy buffer well.

#define DUMMY_BUF_SIZE 8000

void *bufpool_dummy() {
    return bufpool_alloc(0, DUMMY_BUF_SIZE);
}

The acquire operation:

void *bufpool_acquire(bufpool_t *pool, int *len) {
    void *buf = bufpool_dequeue(pool);
    if (!buf) buf = bufpool_dummy();
    *len = buf ? buflen(buf) : 0;
    return buf;
}

The release operation:

void bufpool_release(void *ptr) {
    bufbase_t *base;
    if (!ptr) return;
    base = bufbase(ptr);
    if (base->pool) bufpool_enqueue(base->pool, ptr);
    else free(base);
}

There are two functions around here - bufpool_enqueue and bufpool_dequeue. Basically, they perform all the pool's work.

In my case, there is a O(1) queue of buffer indexes on top of the said above which allows me to keep track of the pool's state more efficiently fetching indexes of buffers very quickly. It's not necessary to go extreme like I did because the maximum size of the pool is limited, hence any array search will be constant in time as well.

In the simplest case, you can implement these functions as pure linear searchers throughout the bufs array in the bufpool_s structure. For example, if a buffer gets acquired, you search for the first non-NULL spot, save the pointer and put NULL in that spot. Next time when the buffer is released, you search for the first NULL spot and save its pointer in there.

The pool internals as follows:

#define BUF_SIZE 64000

void *bufpool_grow(bufpool_t *pool) {
    int idx = pool->size;
    void *buf;
    if (idx == BUFPOOL_CAPACITY) return 0;
    buf = bufpool_alloc(pool, BUF_SIZE);
    if (!buf) return 0;
    pool->bufs[idx] = 0;
    pool->size = idx + 1;
    return buf;
}

void bufpool_enqueue(bufpool_t *pool, void *ptr) {
    int idx;
    for (idx = 0; idx < pool->size; ++idx) {
        if (!pool->bufs[idx]) break;
    }
    assert(idx < pool->size);
    pool->bufs[idx] = ptr;
}

void *bufpool_dequeue(bufpool_t *pool) {
    int idx;
    void *ptr;
    for (idx = 0; idx < pool->size; ++idx) {
        ptr = pool->bufs[idx];
        if (ptr) {
            pool->bufs[idx] = 0;
            return ptr;
        }
    }
    return bufpool_grow(pool);
}

The normal buffer size is 64000 bytes because I want it to comfortably fit in a 64Kb block with its header.

And finally, initialization and de-initialization routines:

void bufpool_init(bufpool_t *pool) {
    pool->size = 0;
}

void bufpool_done(bufpool_t *pool) {
    int idx;
    for (idx = 0; idx < pool->size; ++idx) bufpool_free(pool->bufs[idx]);
}

Please note that this implementation is simplified for illustrative purposes. There is no pool shrinking policy here whereas in a real world scenario, it will be most likely required.

Usage

You should be able to write your libuv callbacks now:

void read_cb(uv_stream_t *stream, ssize_t nread, const uv_buf_t *buf) {
    /* ... */
    bufpool_release(buf->base); /* Release the buffer */
}

Loop initialization:

uv_loop_t *loop = malloc(sizeof(*loop));
bufpool_t *pool = malloc(sizeof(*pool));
uv_loop_init(loop);
bufpool_init(pool);
loop->data = pool;

Operation:

uv_tcp_t *tcp = malloc(sizeof(*tcp));
uv_tcp_init(tcp);
/* ... */
uv_read_start((uv_handle_t *)tcp, alloc_cb, read_cb);

UPDATE (02 Aug 2016)

It's also a good idea to use an adaptive strategy when fetching a buffer depending on the requested size and return pooled buffers only when a big chunk of data is requested (e.g. all reads and long writes). For other cases (e.g. most writes), return dummy buffers. This will help to avoid wasting pooled buffers yet maintaining acceptable allocation speed. For example:

void alloc_cb(uv_handle_t *handle, size_t size, uv_buf_t *buf) {
    int len = size; /* Requested buffer size */
    void *ptr = bufpool_acquire(handle->loop->data, &len);
    *buf = uv_buf_init(ptr, len);
}

void *bufpool_acquire(bufpool_t *pool, int *len) {
    int size = *len;
    if (size > DUMMY_BUF_SIZE) {
        buf = bufpool_dequeue(pool);
        if (buf) {
            if (size > BUF_SIZE) *len = BUF_SIZE;
            return buf;
        }
        size = DUMMY_BUF_SIZE;
    }
    buf = bufpool_alloc(0, size);
    *len = buf ? size : 0;
    return buf;
}

P.S. No need for buflen and bufpool_dummy with this snippet.

like image 179
neoxic Avatar answered Nov 10 '22 00:11

neoxic


If you are on Linux you are in luck. Linux kernel usually uses what is known as SLAB Allocator by default. The advantage of this allocator is that it reduces actual memory allocations by maintaining pools of recyclable blocks. What it means for you is that as long as you always allocate buffers of the same size (ideally pow2 sizes of PAGE_SIZE) you are okay with using malloc() on Linux.

If you are not on Linux (or FreeBSD or Solaris) or if you develop a cross platform application you may consider using glib and its Memory Slices that are a cross-platform implementation of the SLAB allocator. It uses native implementations on platforms that support it so using it on Linux will bring no advantage (I ran some tests myself). I'm sure there are other libraries that can do the same or you can implement it yourself.

like image 44
dtoux Avatar answered Nov 10 '22 01:11

dtoux