Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use casts to access a byte-array like a structure?

I'm working on a microcontroller-based software project. A part of the project is a parser for a binary protocol. The protocol is fixed and cannot be changed. A PC is acting as a "master" and mainly transmits commands, which have to be executed by the "slave", the microcontroller board.

The protocol data is received by a hardware communication interface, e.g. UART, CAN or Ethernet. That's not the problem.

After all bytes of a frame (4 - 10, depending on the command) are received, they are stored in a buffer of type "uint8_t cmdBuffer[10]" and a flag is set, indicating that the command can now be executed. The first byte of a frame (cmdBuffer[0]) contains the command, the rest of the frame are parameters for the command, which may differ in number and size, depending on the command. This means, the payload can be interpreted in many ways. For every possible command, the data bytes change their meaning.

I don't want to have much ugly bit operations, but self-documentating code. So my approach is:

  • I create a "typedef struct" for each command
  • After determining the command, the pointer to the cmdBuffer is casted to a pointer of my new typedef
  • by doing so, I can access the command's parameters as structure members, avoiding magic numbers in array acces, bit operations for parameters > 8 bit, and it is easier to read

Example:

typedef struct
{
    uint8_t commandCode;
    uint8_t parameter_1;
    uint32_t anotherParameter;
    uint16 oneMoreParameter;
}payloadA_t;

//typedefs for payloadB_t and payloadC_t, which may have different parameters

void parseProtocolData(uint8_t *data, uint8_t length)
{
  uint8_t commandToExecute;

  //this, in fact, just returns data[0]
  commandToExecute = getPayloadType(data, length);

  if (commandToExecute == COMMAND_A)
  {
    executeCommand_A( (payloadA_t *) data);
  }
  else if (commandToExecute == COMMAND_B)
  {
    executeCommand_B( (payloadB_t *) data);
  }
  else if (commandToExecute == COMMAND_C)
  {
    executeCommand_C( (payloadC_t *) data);
  }
  else
  {
    //error, unknown command
  }
}

I see two problems with this:

  • First, depending on the microcontroller architecture, the byteorder may be intel or motorola for 2 or 4- byte parameters. This should not be much problem. The protocol itself uses network byte order. On the target controller, a macro can be used for correcting the order.

  • The major problem: there may be padding bytes in my tyepdef struct. I'm using gcc, so i can just add a "packed"-attribute to my typedef. Other compilers provide pragmas for this. However, on 32-bit machines, packed structures will result in bigger (and slower) machine code. Ok, this may also be not a problem. But I'v heard, there can be a hardware fault when accessing un-aligned memory (on ARM architecture, for example).

There are many commands (around 50), so I don't want access the cmdBuffer as an array I think the "structure approach" increases code readability in contrast to the "array approach"

So my questions:

  • Is this approach OK, or is it just a dirty hack?
  • are there cases where the compiler can rely on the "strict aliasing rule" and make my approach not work?
  • Is there a better solution? How would you solve this problem?
  • Can this be kept, at least a little, portable?

Regards, lugge

like image 793
lugge86 Avatar asked Mar 19 '23 19:03

lugge86


2 Answers

Generally, structs are dangerous for storing data protocols because of padding. For portable code, you probably want to avoid them. Keeping the raw array of data is therefore the best idea still. You only need a way to interpret it differently depending on the received command.

This scenario is a typical example where some kind of polymorphism is desired. Unfortunately, C has no built-in support for that OO feature, so you'll have to create it yourself.

The best way to do this depends on the nature of these different kinds of data. Since I don't know that, I can only suggest on such way, it may or may not be optimal for your specific case:

typedef enum
{
  COMMAND_THIS,
  COMMAND_THAT,
  ... // all 50 commands

  COMMANDS_N // a constant which is equal to the number of commands
} cmd_index_t;


typedef struct
{
  uint8_t      command;        // the original command, can be anything
  cmd_index_t  index;          // a number 0 to 49
  uint8_t      data [MAX];     // the raw data
} cmd_t;

Step one would then be that upon receiving a command, you translate it to the corresponding index.

// ...receive data and place it in cmdBuffer[10], then:
cmd_t cmd;
cmd_create(&cmd, cmdBuffer[0], &cmdBuffer[1]);

...

void cmd_create (cmd_t* cmd, uint8_t command, uint8_t* data)
{
   cmd->command = command;
   memcpy(cmd->data, data, MAX);

   switch(command)
   {
     case THIS: cmd->index = COMMAND_THIS; break;
     case THAT: cmd->index = COMMAND_THAT; break;
     ... 
   }
}

Once you have an index 0 to N means that you can implement lookup tables. Each such lookup table can be an array of function pointers, which determine the specific interpretation of the data. For example:

typedef void (*interpreter_func_t)(uint8_t* data);

const interpreter_func_t interpret [COMMANDS_N] =
{
  &interpret_this_command,
  &interpret_that_command,
  ...
};

Use:

interpret[cmd->index] (cmd->data);

Then you can make similar lookup tables for different tasks.

   initialize [cmd->index] (cmd->data);
   interpret  [cmd->index] (cmd->data);
   repackage  [cmd->index] (cmd->data);
   do_stuff   [cmd->index] (cmd->data);
   ...

Use different lookup tables for different architectures. Things like endianess can be handled inside the interpreter functions. And you can of course change the function prototypes, maybe you need to return something or pass more parameters etc.

Note that the above example is most suitable when all commands result in the same kind of actions. If you need to do entirely different things depending on command, other approaches are more suitable.

like image 139
Lundin Avatar answered Apr 01 '23 14:04

Lundin


IMHO it is a dirty hack. The code may break when ported to a system with different alignment requirements, different variable sizes, different type representations (e.g. big endian / little endian). Or even on the same system but different version of compiler / system headers / whatever.

I don't think it violates strict aliasing, so long as the relevant bytes form a valid representation.

I would just write code to read the data in a well-defined manner, e.g.

bool extract_A(PayloadA_t *out, uint8_t const *in)
{
    out->foo = in[0];
    out->bar = read_uint32(in + 1, 4);
    // ...
}

This may run slightly slower than the "hack" version, it depends on your requirements whether you prefer maintenance headaches, or those extra microseconds.

like image 26
M.M Avatar answered Apr 01 '23 12:04

M.M