I have been playing around with trying to implement some protocol decoders, but each time I run into a "simple" problem and I feel the way I am solving the problem is not optimal and there must be a better way to do things. I'm using C. Currently I'm using some canned data and reading it in as a file, but later on it would be via TCP or UDP.
Here's the problem. I'm currently playing with a binary protocol at work. All fields are 8 bits long. The first field(8bits) is the packet type. So I read in the first 8 bits and using a switch/case I call a function to read in the rest of the packet as I then know the size/structure of it. BUT...some of these packets have nested packets inside them, so when I encounter that specific packet I then have to read another 8-16 bytes have another switch/case to see what the next packet type is and on and on. (Luckily the packets are only nested 2 or 3 deep). Only once I have the whole packet decoded can I handle it over to my state machine for processing.
I guess this can be a more general question as well. How much data do you have to read at a time from the socket? As much as possible? As much as what is "similar" in the protocol headers?
So even though this protocol is fairly basic, my code is a whole bunch of switch/case statements and I do a lot of reading from the file/socket which I feel is not optimal. My main aim is to make this decoder as fast as possible. To the more experienced people out there, is this the way to go or is there a better way which I just haven't figured out yet? Any elegant solution to this problem?
A parser is the Network Monitor component that inspects data in a delayed capture, and passes specific protocol information to the application that calls the parser. A parser is passive because it works only when Network Monitor or an expert call it.
Some programs can just process an entire file at once, and other programs need to examine the file line-by-line. In the latter case, you likely need to parse data in each line. Fortunately, the C programming language has a standard C library function to do just that.
The process of identifying and extracting the appropriate fields in a packet header is called parsing and is the subject of this paper.
I recommend this approach:
Pseudo C code (imagine that destinationBuffer is a circular buffer - I believe such data structure is vital in case of applications that need to parse a lot of incoming data):
forever()
{
// this function adds data to the buffer updating it
read_all_you_can(destinationBuffer);
...
handle_data(destinationBuffer);
// the buffer is automatically adjusted in order
// to reflect how much of the data was processed
}
Generally it is better to read as much as possible in order to have more performance.
Resist the temptation to optimize prematurely. First make it work, only then should you think about whether it needs optimization. If you do, do so scientifically: benchmark your code and go for the lowest-hanging fruit first, don't rely on gut feel.
Don't forget that your OS will probably be buffering the data itself, whether you are reading from a file or a socket. Still, repeated syscalls are likely to be a bottleneck, so they may well be a straightforward optimization win. At a former workplace we avoided this issue by having our packet header explicitly encode its length (never more than 8k): that way we knew exactly how much to bulk-read into an array, then our own buffering code took over.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With