Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

protocol parsing in c

I have been playing around with trying to implement some protocol decoders, but each time I run into a "simple" problem and I feel the way I am solving the problem is not optimal and there must be a better way to do things. I'm using C. Currently I'm using some canned data and reading it in as a file, but later on it would be via TCP or UDP.

Here's the problem. I'm currently playing with a binary protocol at work. All fields are 8 bits long. The first field(8bits) is the packet type. So I read in the first 8 bits and using a switch/case I call a function to read in the rest of the packet as I then know the size/structure of it. BUT...some of these packets have nested packets inside them, so when I encounter that specific packet I then have to read another 8-16 bytes have another switch/case to see what the next packet type is and on and on. (Luckily the packets are only nested 2 or 3 deep). Only once I have the whole packet decoded can I handle it over to my state machine for processing.

I guess this can be a more general question as well. How much data do you have to read at a time from the socket? As much as possible? As much as what is "similar" in the protocol headers?

So even though this protocol is fairly basic, my code is a whole bunch of switch/case statements and I do a lot of reading from the file/socket which I feel is not optimal. My main aim is to make this decoder as fast as possible. To the more experienced people out there, is this the way to go or is there a better way which I just haven't figured out yet? Any elegant solution to this problem?

like image 942
NomadAlien Avatar asked Jun 04 '10 12:06

NomadAlien


People also ask

What is protocol parser?

A parser is the Network Monitor component that inspects data in a delayed capture, and passes specific protocol information to the application that calls the parser. A parser is passive because it works only when Network Monitor or an expert call it.

Can you parse in C?

Some programs can just process an entire file at once, and other programs need to examine the file line-by-line. In the latter case, you likely need to parse data in each line. Fortunately, the C programming language has a standard C library function to do just that.

What is parsing a packet?

The process of identifying and extracting the appropriate fields in a packet header is called parsing and is the subject of this paper.


2 Answers

I recommend this approach:

  1. Read all that you can from the file/socket (separate the data communication from the actual protocol)
  2. Pass the data you have read to a procedure for handling data

Pseudo C code (imagine that destinationBuffer is a circular buffer - I believe such data structure is vital in case of applications that need to parse a lot of incoming data):

forever()
{
  // this function adds data to the buffer updating it
  read_all_you_can(destinationBuffer);
  ...
  handle_data(destinationBuffer);
  // the buffer is automatically adjusted in order
  // to reflect how much of the data was processed
}

Generally it is better to read as much as possible in order to have more performance.

like image 95
INS Avatar answered Nov 09 '22 23:11

INS


Resist the temptation to optimize prematurely. First make it work, only then should you think about whether it needs optimization. If you do, do so scientifically: benchmark your code and go for the lowest-hanging fruit first, don't rely on gut feel.

Don't forget that your OS will probably be buffering the data itself, whether you are reading from a file or a socket. Still, repeated syscalls are likely to be a bottleneck, so they may well be a straightforward optimization win. At a former workplace we avoided this issue by having our packet header explicitly encode its length (never more than 8k): that way we knew exactly how much to bulk-read into an array, then our own buffering code took over.

like image 32
crazyscot Avatar answered Nov 09 '22 23:11

crazyscot