Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improving a minimalistic OOP for microcontrollers using C, gcc, C99, and Macros with optimization

Tags:

c

oop

pointers

gcc

Often I have to program microcontrollers in C, because C++ compilers are often not available, or can not make extremely small code because of various bugs. But often, OOP "syntactic sugar", is very convenient when it comes to making programs for hardware more clearly encapsulated for easy maintenance; so I wanted to find out if there was a way to do OOP syntax in C where as much as possible the OOP overhead (when not needed) could be made to optimize out in a way that is portable. eg: That will optimize with gcc targeted for different microcontrollers, or perhaps by using gcc's preprocessor and a generic ANSI-C compiler if gcc is not available for that microcontroller.

I found only threads, like this, Elegant way to emulate 'this' pointer when doing OOP in C? which generally do OOP by embedding pointers into structs but that's not always what I want because it wastes memory when I'm not interested in virtual methods, or anything like that. I can always follow the coding style in the link where those features are needed, but I want to develop techniques for when they are not needed; e.g. I just want to be able to program using OOP paradigms, with simple easy to understand code (Not necessarily C++, though I like C++), and still be able to achieve a minimal C program memory usage when some OOP paradigms are not in use.

So, I resorted to experimentation with gcc, and C99, because in general gcc 3.2 or above is available for most platforms; and realized that I could use the sizeof() and typeof() compiler functions from C99 to index classes automatically ( a 'trick' of sorts ) from an unused/uninitialized union member (So classes must be unions with sub-structs), in order to access a compile time constant lookup table created by macros, which could bind data and methods, and guarantee all type checking. etc. etc. etc.

eg: GCC allows the optimizing out of const structures, and arrays, when their members are only accessed as constant expressions, so I thought I might be able to use that to build a macro based compile time binding system where the OOP overhead is handled in GCC and actually optimizes out of the final binary.

With this system, I can now do variadic macro method calls, like: M( a , init, "with", "any", "parameters", 7 ) which looks up variable a's type, call method init, using variable number of parameters...

See code examples below, and try them out -- it's simpler than the explanation: Use gcc -E to see the macro expansions, and note for ANSI only compilers, the typeof() operator will have to be replaced by a (void*)typecast; type checking only works with GCC.

The code is cut and paste-able into a text editor, with filename on first line, and it does compile and run on normal PC systems.

Although I did succeed in getting rid of individual pointers in every struct to "point back to" a class's list of methods, which saves memory in a limited memory microcontroller, I wasn't quite able to figure out how to get the compiler to optimize out unused method pointers because I had to use (void*) pointers for the classes to hold them in an array, and those require a memory address (address of a struct) and a linker instance; and don't optimize out.

So: I was wondering if anyone knew of a way to improve my solution by making some kind of an initialized method struct which would optimize out (have no linker address) after compilation, eg: when it's members are only accessed as constant expressions in the code. In essence I'm needing to be able to look up an element in array where the initialized portion of each array element is a different classXXX_mt, rather than a list of addresses to classXXX_mt all typecast to (void*).

There's two other improvements I'd like help with if anyone can think of a simple solution; The cpp (c-pre-processor) doesn't allow defining of new macros from within a previous macro by token concatenation (As far as I know), so I have to make fixed length macro lists (A maximum of 10 in my example) to hold class definitions; which means I can only have a maximum of 10 classes in a program; but ideally, I would like a way to make my code more generic, so that the cpp could create variable length lists on the fly. eg: The problem is related the inability of the c pre-processor to "count" automatically.

And secondly, when I try to use anonymous structs for newer versions of GCC, so I might get rid of an extra 'm' required to access member data in ISO-C eg: foo.m.mydata, by deleting the 'm' name from the class union definition, and compile with gcc -std=c11 , it then simply gave me errors claiming the struct defined nothing... so, anonymous structs inside unions don't work even in GCC 4.8 although it supposed to; how can I get anonymous structs to work?

Below is the example of how I tested and implemented an include file, voidbind.h, which builds a list of classes and statically links the methods to the variables of that class type.

Ultimately, the system allows me to program like this example; which I compiled with gcc 4.0 to 4.9 with no problems:

//classtest.c
#ifndef MACROCHECK  // Don't macro expand stdio.h, it's ugly...
#include <stdio.h>  // to see macros, do gcc -D MACROCHECK -E classtest.c
#endif
#include "class1.h" // include example class, library.

#define _VOID_FINALIZE
#include "voidbind.h" // Make class list finalized, no more classes allowed

void main( void ) {
    class1_ct a; // types ending in _ct are the macro created class types
    class2_ct b;

    M( a , init ); // Call method of variable, a, and the function init.
    printf("a=%s %s\n",a.m.name, M( b, tryme, "echo is this" ) ); 
    // I'd love to be rid of .m. in the previous line using anonymous struct
}

Next is the Class definition / header file, for both class1 and class2, showing how the macro pre-processor is used to create classes of data bound to methods and the _ct type; normally this would probably be broken up into two header files, and two libraries; but I'm just abusing the header by putting all the code together for simplicity.

//class1.h
#ifndef _class1_h
#define _class1_h


// Define the data type structure for class1
typedef struct {
    char* name;
    int   one;
} class1_t;

// Define the method type structure for class1 
union class1_ctt ; // class type tag, incomplete tag type for class1_ct
typedef struct { // method prototypes
    void (*init)( union class1_ctt* ); // passed a pointer to class1_ct
} class1_mt;

// bind class1_mt and class1_t together into class1_ct
#define _VOID_NEW_CLASS class1
#include "voidbind.h"

// Begin class2 definition
typedef struct { // define data type for class2
    int x;
} class2_t;

union class2_ctt ; // class type tag, forward definition
typedef struct { // method prototypes for class2
    char* (*tryme)( union class2_ctt*, char* echo );
} class2_mt;

// bind class2_t and class2_mt together into class2_ct
#define _VOID_NEW_CLASS class2
#include "voidbind.h"

// --------------------------------------------- Start library code
// This would normally be a separate file, and linked in
// but as were doing a test, this is in the header instead...

//#include <class1.h>

void class1_init( class1_ct* self ) {
    self->m.name = "test";
    self->m.one=5;  
}

// Define class1's method type (_mt) instance of linker data (_ld):
// voidbind.h when it creates classes, expects an instance of the
// method type (_mt) named with _mt_ld appended to link the prototyped
// methods to C functions.  This is the actual "binding" information
// and is the data that I can't get to "optimize out", eg: when there
// is more than one method, and some of them are not used by the program

class1_mt class1_mt_ld = {
    .init=class1_init
};

// ----------- CLASS2 libcode ----

char* class2_tryme( class2_ct* self, char* echo ) {
    return echo;
}

// class2's method type (_mt) instance of linker data (_ld).
class2_mt class2_mt_ld = { // linker information for method addresses
    .tryme=class2_tryme
};

// --------------------------------------------- End of library code

#endif

Finally, comes voidbind.h This is the heart of the system, Getting the CPP to make a compile time constant list of void* pointers to method structs ... the void* list will always optimize out, as long as everything passed in are compile time constants. (But the structs in the list will not completely optimize out. :( even if constants. )

For this to idea to work, I had to figure out a way to make cpp count how many times the voidbind header file was #included, in order to automatically make a list of class pointers, and since the macro preprocessor can not do addition, or define macros which change based on a previous definition of the same macro name; I had to use inline functions to "save" the pointer to the class method struct (_mt) from one pass to the next. That's what forces me to basically use void* pointers, though it might be solvable in another way.

// voidbind.h
// A way to build compile time void pointer arrays
// These arrays are lists of constants that are only important at compile
// time and which "go away" once the compilation is finished (eg:static bind).
// Example code written by: Andrew F. Robinson of Scappoose


#ifdef _VOID_WAS_FINALIZED //#{
#error voidbind_h was included twice after a _VOID_FINALIZE was defined
#endif //#}

// _VOID_FINALIZE, define only after all class headers have been included. 
// It will simplify the macro expansion output, and minimize the memory impact
// of an optimization failure or disabling of the optimization in a bad compiler
// in hopes of making the program still work.

#ifdef _VOID_FINALIZE //#{
#define _VOID_WAS_FINALIZED
#undef _VOID_BIND
static inline void* _VOID_BIND( int x ) {
    return _VOID_BIND_OBJ[ x ];
}
#else

// Make sure this file has data predefined for binding before being
// included, or else error out so the user knows it's missing a define.

#if ! defined( _VOID_NEW_OBJ ) && ! defined( _VOID_NEW_CLASS ) //#{
#error missing a define of _VOID_NEW_OBJ or _VOID_NEW_CLASS
#endif //#}


// Initialize a macro (once) to count the number of times this file
// has been included; eg: since one object is to be added to the void
// list each time this file is #included. ( _VOID_OBJn ) 

#ifndef _VOID_OBJn //#{
#define _VOID_OBJn _ERROR_VOID_OBJn_NOT_INITIALIZED_

// Initialize, once, macros to do name concatenations 
#define __VOID_CAT( x, y ) x ## y
#define _VOID_CAT( x, y ) __VOID_CAT( x , y )

// Initialize, once, the empty void* list of pointers for classes, objs.
#define _VOID_BIND_OBJ (void* []){\
    _VOID_OBJ0() , _VOID_OBJ1() , _VOID_OBJ2() , _VOID_OBJ3() , _VOID_OBJ4()\
 ,  _VOID_OBJ5() , _VOID_OBJ6() , _VOID_OBJ7() , _VOID_OBJ8() , _VOID_OBJ9()\
}
// Define a function macro to return the list, so it can be easily
// replaced by a _FINALIZED  inline() function, later
#define _VOID_BIND(x) _VOID_BIND_OBJ[ x ]

// All void pointers are initially null macros.  So the void list is 0.
#define _VOID_OBJ0()  0
#define _VOID_OBJ1()  0
#define _VOID_OBJ2()  0
#define _VOID_OBJ3()  0
#define _VOID_OBJ4()  0
#define _VOID_OBJ5()  0
#define _VOID_OBJ6()  0
#define _VOID_OBJ7()  0
#define _VOID_OBJ8()  0
#define _VOID_OBJ9()  0
#endif //#}

// Figure out how many times this macro has been called, by
// checking for how many _VOID_OBJn() function macros have been
// replaced by inline functions

#undef _VOID_OBJn

#if defined( _VOID_OBJ0 ) // #{
#undef _VOID_OBJ0
#define _VOID_OBJn 0
#elif defined( _VOID_OBJ1 )
#undef _VOID_OBJ1
#define _VOID_OBJn 1
#elif defined( _VOID_OBJ2 )
#undef _VOID_OBJ2
#define _VOID_OBJn 2
#elif defined( _VOID_OBJ3 )
#undef _VOID_OBJ3
#define _VOID_OBJn 3
#elif defined( _VOID_OBJ4 )
#undef _VOID_OBJ4
#define _VOID_OBJn 4
#elif defined( _VOID_OBJ5 )
#undef _VOID_OBJ5
#define _VOID_OBJn 5
#elif defined( _VOID_OBJ6 )
#undef _VOID_OBJ6
#define _VOID_OBJn 6
#elif defined( _VOID_OBJ7 )
#undef _VOID_OBJ7
#define _VOID_OBJn 7
#elif defined( _VOID_OBJ8 )
#undef _VOID_OBJ8
#define _VOID_OBJn 8
#elif defined( _VOID_OBJ9 )
#undef _VOID_OBJ9
#define _VOID_OBJn 9 
#else
#error Attempted to define more than ten objects
#endif //#}

// -------------------------------------------------------
// If the user defines _VOID_NEW_CLASS
// Create a union of the two class structs, xxx_t and xxx_mt
// and call it xxx_ct.  It must also be compatible with xxx_ctt, the tag
// which allows forward definitions in the class headers.

#ifdef  _VOID_NEW_CLASS //#{
#ifndef M  //#{
#define M( var , method , ... )\
        (( (typeof(var._VOIDBIND_T))_VOID_BIND( sizeof(*(var._VOIDBIND)) ) )->\
        method( & var , ## __VA_ARGS__ ))
#endif //#}
extern _VOID_CAT( _VOID_NEW_CLASS , _mt ) _VOID_CAT( _VOID_NEW_CLASS , _mt_ld );
typedef union _VOID_CAT( _VOID_NEW_CLASS, _ctt ) {
    char (*_VOIDBIND)[ _VOID_OBJn ];
    _VOID_CAT( _VOID_NEW_CLASS , _mt ) *_VOIDBIND_T;
    _VOID_CAT( _VOID_NEW_CLASS , _t ) m ;
} _VOID_CAT( _VOID_NEW_CLASS , _ct );

static inline void* (_VOID_CAT( _VOID_OBJ , _VOID_OBJn )) ( void ) {
    return & _VOID_CAT( _VOID_NEW_CLASS, _mt_ld );
}
#undef _VOID_NEW_CLASS
#else // ---------- Otherwise, just bind whatever object was passed in
static inline _VOID_CAT( _VOID_OBJ , _VOID_OBJn ) (void) {
    return (void*) & _VOID_NEW_OBJ ;
}
#undef _VOID_NEW_OBJ
#endif //#}

// End of Macros to define a list of pointers to class method structures
// and to bind data types to method types.

#endif //#}
like image 601
Andrew of Scappoose Avatar asked Apr 28 '15 03:04

Andrew of Scappoose


3 Answers

In general what you are asking for is C++. the examples you posted are most likely going to be the more efficient or equally efficient using a C++ compiler.

Often on embedded targets you have far outdated versions of gcc that generate bad code for c++ or don't support all the gory c++ details.

You can try to run ${your_arch_prefix}-g++ --nostdlib --nostdinc which will enable c++ syntax in the parser without all the things that waste space. if you want to disable other things you can add -fno-rtti -fno-exceptions with remove runtime type checking and exception support (See this question).

Since the C++ parser is part of the C front-end even though C++ isn't officially supported by your micro controller vendor, this might still be working (sometimes you can also give it a try to compile the vendor specific version yourself and add c++ to the languages in the configure script).

This is usally considered superior to trying to invent your own OOP like macro DSL (domain specific language).

This being said if you don't want to go this path and don't want to use hand-craft vtables (as in your link). The simplest thing to do is have coding conventions. If you don't want polymorphism the code below is sufficient. you can define your struct and functions in a .c file and put the declarations in headers. The function below can be called directly so it's not in a vtable, and the first member is the this pointer in c++. struct impl is the actual data that the object holds not a vtable or similar.

struct impl;
struct impl *make_impl();
// don't use this as it is a reserved keyword in c++
void do_bar(struct impl *myThis, int bar);

If you want polymorphism look at what the kernel does. they explicitly embedd the vtable in the object and use macros to extract them and initialze them.

look at the definition of char device for instance.

and look at how people instanciate this in code and headers. Look at the container_of macro and understand how media_entity_to_video_device casting works. (If this is too little context for you, look at this Book: Linux Device Drivers (LDD3)).

I know your code works and you should be proud to understand what you are doing. But if you show your code to other people, they expect you to either write C or C++. If you are in C and are missing OOP I would try to write the code in a way, that others can grasp easily what you are doing. Using macros to extract function pointers or get a polymorphic member is usually fine, hiding function calls and generate structs in macros is often unreadable and people have to debug your code while running gcc -E to see your creations expanded from the preprocessor to understand what they are actually calling.

Edit

I've had a very quick shot at generating C code from clang++. according to this so question and this one the commands should be:

$ clang++ -std=c++11 -S -emit-llvm -o out main.cc # Worked
$ llc -march=c out 
llc: error: invalid target 'c'.

 $ clang++ --version
 clang version 3.7.0 (trunk 232670)
Target: x86_64-unknown-linux-gnu
Thread model: posix

It seems the clang C backend has been removed (see also these sources resurrecting the C-backend code). That being said you could also have a look at generating a backend for your target plattform, but I think thats definitely over-engineered.

like image 115
Alexander Oh Avatar answered Nov 14 '22 22:11

Alexander Oh


For the side question, you can use -std=gnu99 to get C99 with gnu extensions (like anonymous struct and union members within structs and unions).

like image 1
luser droog Avatar answered Nov 14 '22 22:11

luser droog


The question mentions -std=c11, so I guess that use of _Generic is OK in this situation.

Since what you appear to be asking for is a way to statically resolve methods from a shared name based on argument type, it makes some sense to look at overloading(/static polymorphism/ad-hoc polymorphism/etc.) as the basis for your system's operation, rather than trying to optimize a pattern generally intended for runtime resolution. _Generic is a static type->value selection operator that is intended specifically for helping with situations like this. It allows you to macro-expand the type-selection code directly into the calling expression and guarantees it will be removed at compile-time, which is exactly what you need.

Since it's an expression operator, _Generic has to list all of the types it's going to operate on in the expression. This means something has to be clustered, which isn't a perfect fit for your OOP strategy. Conventional overloading strategies cluster the function definitions, which would mess up trying to organize methods into classes; however, if you're willing to make an explicit list of all classes in use in your program (i.e. cluster the types instead) it should still be possible to achieve static resolution in a similar way.

e.g. (rough example):

#include <stdio.h>

// shared method table structure for all classes
typedef struct {
    void (* init)( void* );
    char* (* tryme)( void*, char* echo );
} poly_method_table;

// define class1
typedef struct {
    char* name;
    int   one;
} class1;
void class1_init( class1* self ) {
    self->name = "test";
    self->one=5;  
}
const poly_method_table class1_mt = {
    .init = class1_init
};

// define class2
typedef struct {
    int x;
} class2;
char* class2_tryme( class2* self, char* echo ) {
    return echo;
}
const poly_method_table class2_mt = {
    .tryme = class2_tryme
};

// global lookup table
const poly_method_table * table_select[] = {
    &class1_mt,
    &class2_mt,
};
#define M(MSG, THIS, ...) table_select[_Generic((THIS), \
    class1 *: 0, \
    class2 *: 1, \
    default: "error")]->MSG((THIS), ## __VA_ARGS__)


int main( void ) {
    class1 a;
    class2 b;

    M( init, &a );
    printf("a=%s %s\n",a.name, M( tryme, &b, "echo is this" ) );
}

The method operator M produces a constant lookup value into the global table-of-vtables (instead of trying to retrieve the vtable from the object itself). With enough const declarations I would expect a decent optimizer to be able to remove this and go straight to the selected function, since there's no runtime variance in which vtable gets selected.

Since you're already using GNU extensions (i.e. ,## for method calls), you could improve this by using typeof to cast the vtable lookup to a specialized type for each class (instead of having a single vtable class that supports all polymorphic method names), potentially reducing size somewhat and making room for further overloading at the method level.

You could remove the annoying repetition in the definitions of table_select and M with a FOR_EACH macro (it would automatically fill out the table, the middle of the _Generic block, and an enum to build indexes), e.g.:

#define CLASSES class1, class2 //etc.

#define BUILD_ENUM(class) class ## _enum,
#define BUILD_SELECTOR(class) &class ## _mt,
#define SELECT_CLASS(class) class *: class ## _enum,

#define M(MSG, THIS, ...) table_select[_Generic((THIS), \
  FOR_EACH(SELECT_CLASS, CLASSES) \
  default: "error")]->MSG((THIS), ## __VA_ARGS__)

enum { FOR_EACH(BUILD_ENUM, CLASSES) };
const poly_method_table * table_select[] = {
    FOR_EACH(BUILD_SELECTOR, CLASSES)
};

(you can find suitable definitions of FOR_EACH elsewhere on SO)

like image 1
Leushenko Avatar answered Nov 14 '22 22:11

Leushenko