Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing different yet similar structure/function sets without copy-paste

I'm implementing a set of common yet not so trivial (or error-prone) data structures for C (here) and just came with an idea that got me thinking.

The question in short is, what is the best way to implement two structures that use similar algorithms but have different interfaces, without having to copy-paste/rewrite the algorithm? By best, I mean most maintainable and debug-able.

I think it is obvious why you wouldn't want to have two copies of the same algorithm.

Motivation

Say you have a structure (call it map) with a set of associated functions (map_*()). Since the map needs to map anything to anything, we would normally implement it taking a void *key and void *data. However, think of a map of int to int. In this case, you would need to store all the keys and data in another array and give their addresses to the map, which is not so convenient.

Now imagine if there was a similar structure (call it mapc, c for "copies") that during initialization takes sizeof(your_key_type) and sizeof(your_data_type) and given void *key and void *data on insert, it would use memcpy to copy the keys and data in the map instead of just keeping the pointers. An example of usage:

int i;
mapc m;
mapc_init(&m, sizeof(int), sizeof(int));
for (i = 0; i < n; ++i)
{
    int j = rand();  /* whatever */
    mapc_insert(&m, &i, &j);
}

which is quite nice, because I don't need to keep another array of is and js.

My ideas

In the example above, map and mapc are very closely related. If you think about it, map and set structures and functions are also very similar. I have thought of the following ways to implement their algorithm only once and use it for all of them. Neither of them however are quite satisfying to me.

  1. Use macros. Write the function code in a header file, leaving the structure dependent stuff as macros. For each structure, define the proper macros and include the file:

    map_generic.h
    
    #define INSERT(x) x##_insert
    
    int INSERT(NAME)(NAME *m, PARAMS)
    {
        // create node
        ASSIGN_KEY_AND_DATA(node)
        // get m->root
        // add to tree starting from root
        // rebalance from node to root
        // etc
    }
    
    map.c
    
    #define NAME map
    #define PARAMS void *key, void *data
    #define ASSIGN_KEY_AND_DATA(node) \
    do {\
        node->key = key;\
        node->data = data;\
    } while (0)
    #include "map_generic.h"
    
    mapc.c
    
    #define NAME mapc
    #define PARAMS void *key, void *data
    #define ASSIGN_KEY_AND_DATA(node) \
    do {\
        memcpy(node->key, key, m->key_size);\
        memcpy(node->data, data, m->data_size);\
    } while (0)
    
    #include "map_generic.h"
    

    This method is not half bad, but it's not so elegant.

  2. Use function pointers. For each part that is dependent on the structure, pass a function pointer.

    map_generic.c
    
    int map_generic_insert(void *m, void *key, void *data,
        void (*assign_key_and_data)(void *, void *, void *, void *),
        void (*get_root)(void *))
    {
        // create node
        assign_key_and_data(m, node, key, data);
        root = get_root(m);
        // add to tree starting from root
        // rebalance from node to root
        // etc
    }
    
    map.c
    
    static void assign_key_and_data(void *m, void *node, void *key, void *data)
    {
        map_node *n = node;
        n->key = key;
        n->data = data;
    }
    
    static map_node *get_root(void *m)
    {
        return ((map *)m)->root;
    }
    
    int map_insert(map *m, void *key, void *data)
    {
        map_generic_insert(m, key, data, assign_key_and_data, get_root);
    }
    
    mapc.c
    
    static void assign_key_and_data(void *m, void *node, void *key, void *data)
    {
        map_node *n = node;
        map_c *mc = m;
        memcpy(n->key, key, mc->key_size);
        memcpy(n->data, data, mc->data_size);
    }
    
    static map_node *get_root(void *m)
    {
        return ((mapc *)m)->root;
    }
    
    int mapc_insert(mapc *m, void *key, void *data)
    {
        map_generic_insert(m, key, data, assign_key_and_data, get_root);
    }
    

    This method requires writing more functions that could have been avoided in the macro method (as you can see, the code here is longer) and doesn't allow optimizers to inline the functions (as they are not visible to map_generic.c file).

So, how would you go about implementing something like this?

Note: I wrote the code in the stack-overflow question form, so excuse me if there are minor errors.

Side question: Anyone has a better idea for a suffix that says "this structure copies the data instead of the pointer"? I use c that says "copies", but there could be a much better word for it in English that I don't know about.


Update:

I have come up with a third solution. In this solution, only one version of the map is written, the one that keeps a copy of data (mapc). This version would use memcpy to copy data. The other map is an interface to this, taking void *key and void *data pointers and sending &key and &data to mapc so that the address they contain would be copied (using memcpy).

This solution has the downside that a normal pointer assignment is done by memcpy, but it completely solves the issue otherwise and is very clean.

Alternatively, one can only implement the map and use an extra vectorc with mapc which first copies the data to vector and then gives the address to a map. This has the side effect that deletion from mapc would either be substantially slower, or leave garbage (or require other structures to reuse the garbage).


Update 2:

I came to the conclusion that careless users might use my library the way they write C++, copy after copy after copy. Therefore, I am abandoning this idea and accepting only pointers.

like image 862
Shahbaz Avatar asked Jun 14 '12 13:06

Shahbaz


2 Answers

You roughly covered both possible solutions.

The preprocessor macros roughly correspond to C++ templates and have the same advantages and disadvantages:

  • They are hard to read.
  • Complex macros are often hard to use (consider type safety of parameters etc.)
  • They are just "generators" of more code, so in the compiled output a lot of duplicity is still there.
  • On other side, they allow compiler to optimize a lot of stuff.

The function pointers roughly correspond to C++ polymorphism and they are IMHO cleaner and generally easier-to-use solution, but they bring some cost at runtime (for tight loops, few extra function calls can be expensive).

I generally prefer the function calls, unless the performance is really critical.

like image 150
mity Avatar answered Sep 28 '22 01:09

mity


There's also a third option that you haven't considered: you can create an external script (written in another language) to generate your code from a series of templates. This is similar to the macro method, but you can use a language like Perl or Python to generate the code. Since these languages are more powerful than the C pre-processor, you can avoid some of the potential problems inherent in doing templates via macros. I have used this method in cases where I was tempted to use complex macros like in your example #1. In the end, it turned out to be less error-prone than using the C preprocessor. The downside is that between writing the generator script and updating the makefiles, it's a little more difficult to get set up initially (but IMO worth it in the end).

like image 37
bta Avatar answered Sep 28 '22 01:09

bta