Embedded C data storage module design

Question

I'm in the process of designing a embedded C data storage module. It will be included by files/modules who want access to this "shared" system-wide data. Multiple tasks aggregate dozens of inputs (GPIO, CAN, I2C/SPI/SSP data, etc) and stores those values off using the API. Then, other tasks can access the data safely through the API. The system is an embedded app with an RTOS, so mutexes are used to protect the data. These ideas will be used regardless of the implementation

I've designed something like this in the past, and I'm trying to improve upon it. I'm currently halfway through a new implementation and I'm running into a few hiccups and would really benefit from a fresh perspective.

Quick rundown of the requirements of this module:

Ideally, there would be one interface that can access the variables (one get, one set).

I'd like to return different variable types (floats, ints, etc). This means macros are probably needed.

I'm not pressed for code space, but it's always a concern

Quick gets/sets are absolutely paramount (which means storing in strings ala xml/json is out)

No new variables need to be added during runtime. Everything is statically defined at boot

The question is how would you go about designing something like this? Enumerations, structures, accessors, macros, etc? I'm not looking for code here, just discussing general overall design ideas. If there's a solution on the internet that addresses these sorts of things, perhaps even just a link is sufficient.

Dan · Accepted Answer

I've been in this situation a couple times myself. Every time I've ended "rolling my own", and I definitely don't suffer from Not Invented Here (NIH) syndrome. Sometimes the space, processing turnaround time or reliability/recoverability requirements just make that the least painful path.

So, rather than writing the great American novel on the topic, I'll just throw out some thoughts here, as your question is pretty broad (but thank you for at least forming a question & providing background).

Is C++ on the table? Inline functions, templates, and some of the Boost libraries might be useful here. But I'm guessing this is straight-up C.

If you're using C99, you can at least use inline functions, which are a step above macros when it comes to type safety.

You might want to think about using several mutexes to protect different parts of the data; even though the updates are quick, you might want to break up the data into sections (e.g. configuration data, init data, error logging data, trace data, etc.) and give each its own mutex, reducing the funnel/choke points.

You could also consider making all access to the data go through a server task. All reads & writes go through an API which communicates with the server task. The server tasks pulls reads & write requests in order from its queue, handles them quickly by writing to a RAM mirror, sending responses if needed (at least for read requests), and then buffers data to NVM in the background, if necessary. Sounds heavyweight compared to simple mutexes but it has its advantages in certain use cases. Don't know enough about your application to know if this is a possibility.

One thing I will say, is the idea of getting/setting by a tag (e.g. maybe a list of enums like CONFIG_DATA, ADDRESS_DATA, etc.) is a huge step forward from directly addressing data (e.g. "give me the 256 bytes at address ox42000). I've seen many many shops suffer great pain when the whole physical-addressing scheme finally breaks down & they need to re-factor / re-design. Try to keep the "what" decoupled from the "how" - clients shouldn't have to know or care where stuff is stored, how big it is, etc. (You might already know all this, sorry if so, I just see it all the time...)

One last thing. You mentioned mutexes. Beware of priority inversion... that can make those "quick accesses" take a very long time in some cases. Most kernel mutex implementations allow you to account for this, but often it's not enabled by default. Again, sorry if this is old news...

scooz · Answer

A few approaches I've had experience with and have found each good for its own needs. Just writing down my thoughts on this issue, hope this gives you some ideas you can go with...

For more complex data with lots of dependencies and constraints, I found a higher-level module is usually preferable, even when coming in the expense of speed. It saves you a lot of headache, and its usually the way to go in my experience, unless you have really tough constraints. The thing is to store most of your "static" data, that doesn't change a lot, in this type of module, and benefit from the ease of use and the more "complex" features (e.g. Data Validation, Referential Integrity, Querying, Transactions, Privileges etc). However, if speed is crucial in some areas, you might want to keep it out of this module altogether, or implement some simple "mirroring" scheme (where you could for example periodically synchronize back and forth between your lightweight mirrors and main configuration DB on some idle task). Having some experience with this issue on embedded systems, I have found sqlite to be the easiest to work with. Seeing your comments, I can testify the porting is not that big of a deal if you have the C standard library supported for your platform. I have ported the system to a PowerPC architecture in about a week, and we had pure in-house proprietary code. It was mostly a matter of getting the OS abstraction layer figured out and was smooth sailing from there. I highly recommend sqlite for this.
If the above is still too heavy (either in terms of performance, or perhaps it is just overkill), you can get most of the important benefits from using a basic key-value system. The idea is similar to what was mentioned here before: define your keys via enum's, store your data in a table of same-sized (e.g. 64 bits) values, and let each variable reside in one cell. I find this to work for simpler projects, with not THAT many types of configuration entries, while you can keep your flexibility and the system is still easy to develop and use. A few pointers for this:
- Divide your enum's into groups or categories, each with its own value range. A scheme I like is to define the groups themselves in one enum, and use those values for starting off the first value of each of the groups (EntryInGroupID = (groupid << 16) | serial_entry_id).
- It's easy to implement "tables" using this scheme (where you have more than one entry for each need or configuration, e.g. routing tables). You can generalize the design by making anything a "table", and if you want to define a single value, just make it a table with one entry. In embedded systems I would recommend allocating everything up-front, according to the maximum number of entries possible for that table, and "using" part of it. That way you've covered the worst case and are being deterministic.
- Another thing I found very useful was using Hooks for everything. These hooks can be implemented using Function Pointers (for each behavior / policy / operation), which are given through a "descriptor" struct which is used to define the data entry. Usually, your entries will use the "default" hook functions for read/write (e.g. direct memory access, or file access), and NULL-out the other possible hooks (e.g. "validate", "default value", "notify for change" and such). Once in a while, when you have a more complicated case, you can take benefit of the hooks system you've planted in with little development cost. This hooks system can inherently provides you with a "hardware abstraction layer" for the storage of the values themselves, and can access data from memory, registers, or files with the same ease. A "notify for changes" register/unregister facility can also really help for notifying tasks about configuration changes, and is not that hard to add.
For ultra-real-time needs, I would just recommend you keep it simple and save your data in structs. Its pretty much the most lightweight way you can go, and it usually handles all your needs. However, it is hard to handle concurrency in this system, so you would either have to wrap all access with mutexes like you suggested, or just keep a mirror for every "task". If this is indeed ultra-real-time, I would go for the mirroring approach with periodical sync's throughout the system). Both this system and the one suggested above make it super easy to serialize your data (as it is always stored in binary format), which will ease things down the road and just keep things simple (all you need for the serialization is a memcpy with a cast or two). Also, using unions from time to time might help, and some optimizers handle it well, however it really depends on the architecture and toolchain you're working with.
If you're really worried about having to upgrade your data for new schemas, you might want to consider an "upgrader module", which can be configured with several schemas, and will be used to convert old formats to the new when invoked. You can use code generation techniques from XML schemas to generate the database itself (e.g. into structs and enums), and take advantage of XML tools to perform the "conversion code", by writing XSLT or just plain python-like scripts. You might want to perform the conversion off-target (e.g. at the host), with a more high-level script, rather than cramming that logic into the target itself.
For Data Type Safety you can use lots of template and meta-programming wizardry. While tough to maintain as the framework coder, it eases things up for the users of the system. If coded right (which is tricky!), it can save you lots of cycles and not impose a lot of code overhead (with inlining, explicit template instantiation and much care). If your compiler isn't all that good (which is usually the case with embedded platform compilers), just use wrapper macros for conversion.
About the concurrency issue, I generally find myself resorting to one of the following methods, depending of the overall design of the system
- Like you suggested, wrap the whole thing with a mutex and hope for the best. You can get more precision by adding per-table or even per-entry locks, but this has its own set of problems and I would recommend just keeping it simple and stick to a global mutex. If you have replications for the hard real-time stuff, this probably won't cripple your performance since you only do the sync once in a while.
- Design your database as a "single task", and wrap all access to it via messages (MQ's, sockets, whatever). This works better for the non-critical-path data (again), and is an easy solution. If there aren't too much simultaneous accesses going on, this might be the way to go. Note that if you're in the same memory range and have shared memory, the performance of reads/writes isn't that expensive (since you aren't copying data), and if you play with your task's priorities right, it can be a simple solution that will solve most of your needs (by avoiding the problem of concurrent access altogether).

Most systems I've worked with required some type of configuration module, usually over both volatile and non-volatile memory. It's very easy (and even tempting) to over-design this, or bring an overkill solution where its not needed. I have found that trying to be "future-proof" on these issues will in a lot of cases just be a waste of time, usually the simplest way to go is the best (in real-time and embedded systems anyway). Also, performance issues tend to creep up when you scale the system, and its sometimes to stick with the simpler and faster solution up ahead.

Therefore, given the information you've supplied and the above analysis, I'd recommend you go with the key-value or direct structs approach. I'm a personal fan of the "single task" approach for my systems, but there is really no one absolute answer for this, and it requires deeper analysis. For these solutions, having looked for a "generic" and off-the-shelf solution, I always find myself implementing it myself in 1-2 weeks and saving myself a lot of headaches.

Hope the answer wasn't overkill :-)

Ian Howson · Answer

I usually go for a simple dictionary-like API using an int as the key and a fixed-size value. This executes quickly, uses a very small amount of program RAM and has predictable data RAM usage. In other words, the lowest-level API looks like:

void data_set(uint16 key, uint32 value);
uint32 data_get(uint16 key);

Keys become a list of constants:

#define KEY_BOGOMIPS 1
#define KEY_NERDS_PER_HOUR 2

You handle different data types by casting. Sucks, but you can write macros to make the code a little cleaner:

#define data_get_float(key) (float)data_get(key)

Achieving type safety is difficult to do without writing a separate macro or accessor function for each item. On one project, I needed validation of input data, and this became the type-safety mechanism.

The way you structure the physical storage of data depends how much data memory, program memory, cycles and separate keys you have. If you've got lots of program space, hash the key to get a smaller key that you can look up directly in an array. Usually, I make the underlying storage look like:

struct data_item_t {
    uint16 key;
    uint32 value;
}

struct data_item_t items[NUM_ITEMS];

and iterate through. For me, this has been fast enough even on very small (8-bit) microcontrollers, though it might not fit for you if you've got a lot of items.

Remember that your compiler will probably inline or optimise the writes nicely, so cycles per access may be lower than you'd expect.

Embedded C data storage module design

Tags:

c

database

embedded

Jeff Lamb

3 Answers

Dan

scooz

Ian Howson

Recent Activity

Donate For Us

Embedded C data storage module design

Tags:

c

database

embedded

Jeff Lamb

3 Answers

Dan

scooz

Ian Howson

Related questions

Recent Activity

Donate For Us