I'm implementing a set of common yet not so trivial (or error-prone) data structures for C (here) and just came with an idea that got me thinking.
The question in short is, what is the best way to implement two structures that use similar algorithms but have different interfaces, without having to copy-paste/rewrite the algorithm? By best, I mean most maintainable and debug-able.
I think it is obvious why you wouldn't want to have two copies of the same algorithm.
Say you have a structure (call it map
) with a set of associated functions (map_*()
). Since the map needs to map anything to anything, we would normally implement it taking a void *key
and void *data
. However, think of a map of int
to int
. In this case, you would need to store all the keys and data in another array and give their addresses to the map
, which is not so convenient.
Now imagine if there was a similar structure (call it mapc
, c for "copies") that during initialization takes sizeof(your_key_type)
and sizeof(your_data_type)
and given void *key
and void *data
on insert, it would use memcpy
to copy the keys and data in the map instead of just keeping the pointers. An example of usage:
int i;
mapc m;
mapc_init(&m, sizeof(int), sizeof(int));
for (i = 0; i < n; ++i)
{
int j = rand(); /* whatever */
mapc_insert(&m, &i, &j);
}
which is quite nice, because I don't need to keep another array of i
s and j
s.
In the example above, map
and mapc
are very closely related. If you think about it, map
and set
structures and functions are also very similar. I have thought of the following ways to implement their algorithm only once and use it for all of them. Neither of them however are quite satisfying to me.
Use macros. Write the function code in a header file, leaving the structure dependent stuff as macros. For each structure, define the proper macros and include the file:
map_generic.h
#define INSERT(x) x##_insert
int INSERT(NAME)(NAME *m, PARAMS)
{
// create node
ASSIGN_KEY_AND_DATA(node)
// get m->root
// add to tree starting from root
// rebalance from node to root
// etc
}
map.c
#define NAME map
#define PARAMS void *key, void *data
#define ASSIGN_KEY_AND_DATA(node) \
do {\
node->key = key;\
node->data = data;\
} while (0)
#include "map_generic.h"
mapc.c
#define NAME mapc
#define PARAMS void *key, void *data
#define ASSIGN_KEY_AND_DATA(node) \
do {\
memcpy(node->key, key, m->key_size);\
memcpy(node->data, data, m->data_size);\
} while (0)
#include "map_generic.h"
This method is not half bad, but it's not so elegant.
Use function pointers. For each part that is dependent on the structure, pass a function pointer.
map_generic.c
int map_generic_insert(void *m, void *key, void *data,
void (*assign_key_and_data)(void *, void *, void *, void *),
void (*get_root)(void *))
{
// create node
assign_key_and_data(m, node, key, data);
root = get_root(m);
// add to tree starting from root
// rebalance from node to root
// etc
}
map.c
static void assign_key_and_data(void *m, void *node, void *key, void *data)
{
map_node *n = node;
n->key = key;
n->data = data;
}
static map_node *get_root(void *m)
{
return ((map *)m)->root;
}
int map_insert(map *m, void *key, void *data)
{
map_generic_insert(m, key, data, assign_key_and_data, get_root);
}
mapc.c
static void assign_key_and_data(void *m, void *node, void *key, void *data)
{
map_node *n = node;
map_c *mc = m;
memcpy(n->key, key, mc->key_size);
memcpy(n->data, data, mc->data_size);
}
static map_node *get_root(void *m)
{
return ((mapc *)m)->root;
}
int mapc_insert(mapc *m, void *key, void *data)
{
map_generic_insert(m, key, data, assign_key_and_data, get_root);
}
This method requires writing more functions that could have been avoided in the macro method (as you can see, the code here is longer) and doesn't allow optimizers to inline the functions (as they are not visible to map_generic.c
file).
So, how would you go about implementing something like this?
Note: I wrote the code in the stack-overflow question form, so excuse me if there are minor errors.
Side question: Anyone has a better idea for a suffix that says "this structure copies the data instead of the pointer"? I use c
that says "copies", but there could be a much better word for it in English that I don't know about.
I have come up with a third solution. In this solution, only one version of the map
is written, the one that keeps a copy of data (mapc
). This version would use memcpy
to copy data. The other map
is an interface to this, taking void *key
and void *data
pointers and sending &key
and &data
to mapc
so that the address they contain would be copied (using memcpy
).
This solution has the downside that a normal pointer assignment is done by memcpy
, but it completely solves the issue otherwise and is very clean.
Alternatively, one can only implement the map
and use an extra vectorc
with mapc
which first copies the data to vector and then gives the address to a map
. This has the side effect that deletion from mapc
would either be substantially slower, or leave garbage (or require other structures to reuse the garbage).
I came to the conclusion that careless users might use my library the way they write C++, copy after copy after copy. Therefore, I am abandoning this idea and accepting only pointers.
You roughly covered both possible solutions.
The preprocessor macros roughly correspond to C++ templates and have the same advantages and disadvantages:
The function pointers roughly correspond to C++ polymorphism and they are IMHO cleaner and generally easier-to-use solution, but they bring some cost at runtime (for tight loops, few extra function calls can be expensive).
I generally prefer the function calls, unless the performance is really critical.
There's also a third option that you haven't considered: you can create an external script (written in another language) to generate your code from a series of templates. This is similar to the macro method, but you can use a language like Perl or Python to generate the code. Since these languages are more powerful than the C pre-processor, you can avoid some of the potential problems inherent in doing templates via macros. I have used this method in cases where I was tempted to use complex macros like in your example #1. In the end, it turned out to be less error-prone than using the C preprocessor. The downside is that between writing the generator script and updating the makefiles, it's a little more difficult to get set up initially (but IMO worth it in the end).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With