Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamic recognition and handling of protocol data units in bytestream

I implemented a protocol for a little multiplayer game. It was based on bytes, so to deserialize the received messages I had to iterate over the byte stream and parse it bit by bit. After I had all the bytes and knew the message type, I threw the bytes in a reverse constructor that constructed the protocol data unit from the raw bytes.

This whole process was very ugly, not really OO and had unreadable if/else code. I had to implement the reverseConstructor(byte[] bytes) for every protocol data unit (pdu) I added. An approach where some kind of schema is defined per pdu (e. g. schema = [1 byte int (id = x), x bytes ascii string, 4 bytes double]), and where the handling of the bytes is done with that schema, would be more elegant.

I got a hint here on SO to use google's protobufs (Apparently they are not fitting my needs, since I would have to change the protocol to adhere to protobuf standards).

INFO

I can't change the protocol. There are two different scenarios (I don't want to support them at the same time or even in the same program):

  • The protocol data units have a length field encoded in the header
  • The protocol data units have no length field, but one can derive from the message type when/where the message ends.

I personally am a fan of length fields. But sometimes you have to adhere to a protocol that somebody else designed. So the protocols are fix. They all have a header which contains the protocol id, the unique message id, and in the first scenario a length field.

QUESTION

Can anyone give me a very small example with two simple protocol data units that are parsed by a efficient, generic receive method? The only examples I found in the protobuf tutorials were of type: user a sends message x, user b expects message X and can deserialize it without problem.

But what if user b has to be prepared for message x, y and z. How can one handle this situation without much code duplication in an intelligent way.

I would also appreciate hints to design principles that enable me to achieve greater code here without the use of a extern library.


EDIT

I think sth like that is the way to go. You can find more of the code here. The bytes are read dynamically till an object is found, and then the position of the buffer is reset.

                while (true) {
                        if (buffer.remaining() < frameLength) {
                                buffer.reset();
                                break;
                        }
                        if (frameLength > 0) {
                                Object resultObj = prototype.newBuilderForType().mergeFrom(buffer.array(), buffer.arrayOffset() + buffer.position(), frameLength).build();
                                client.fireMessageReceived(resultObj);
                                buffer.position(buffer.position() + frameLength);
                                buffer.mark();
                        }
                        if (buffer.remaining() > fieldSize) {
                                frameLength = getFrameLength(buffer);
                        } else {
                                break;
                        }
                }

JavaDoc - mergeFrom

Parse data as a message of this type and merge it with the message being built. This is just a small wrapper around MessageLite.Builder.mergeFrom(CodedInputStream). https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/Message.Builder#mergeFrom(byte[])

The problem is the part message of this type, but it should be possible to address this issue with a generic approach.


SAMPLE

Here is a sample protocol data unit. It has a length field. There is another scenario where the pdus have no length field. This pdu is of variable size. There are also pdus of fixed size.

sample pdu

For completeness' sake. Here the representation of strings in the protocol data units.

strings in pdu

like image 812
mike Avatar asked Aug 16 '13 09:08

mike


Video Answer


2 Answers

(Note: it's been a while since I've used Java so I wrote this in C#, but you should get the general idea)

The general idea is:

  1. Each of your parsers should be basically represented as an interface, or a delegate (or a method, or a function pointer) with a signature of something like:

    interface IParser<T>
    {   
         IParserResult<T> Parse(IIndexable<byte> input);
    }
    
  2. The result of the parsing operation is an instance of the IParserResult<T> interface, which should tell you the following:

    • Whether the parsing succeeded,

    • If it failed, why it failed (not enough data to finish parsing, not the right parser, or CRC error, or exception while parsing),

    • If it succeeded, the actual parsed message value,

    • If it succeeded, the next parser offset.

    In other words, something like:

    interface IParserResult<T>
    {
         boot Success { get; } 
         ErrorType Error { get; } // in case it failed
         T Result { get; } // null if failed
         int BytesToSkip { get; } // if success, number of bytes to advance 
    }
    
  3. Your parser thread should iterate through a list of parsers and check results. It should look more or less like this:

    // presuming inputFifo is a Queue<byte> 
    while (inputFifo.ContainsData) 
    {
         foreach (IParser parser in ListOfParsers) 
         {
             var result = parser.Parse(inputFifo);
    
             if (result.Success) 
             {
                 FireMessageReceived(result.Value);
                 inputFifo.Skip(result.BytesToSkip);
                 break;
             }
    
             // wrong parser? try the next one
             if (result.ErrorType == ErrorType.UnsupportedData)
             {
                 continue;
             }
    
             // otherwise handle errors
             switch (result.ErrorType) 
             {
                 ...
             }
         }
    }
    

The IIndexable<byte> interface is not a part of .NET, but it's rather important for avoiding numerous array allocations (this is the CodeProject article).

The good thing about this approach is that the Parse method can do a whole lot of checks to determine if it "supports" a certain message (check the cookie, length, crc, whatever). We use this approach when parsing data which is constantly being received on a separate thread from unreliable connections, so each parser also returns a "NotEnoughData" error if the length is too short to tell if the message is valid or not (in which case the loop breaks and waits for more data).

[Edit]

Additionally (if this also helps you), we use a list (or a dictionary to be precise) of "message consumers" which are strongly typed and tied to a certain parser/message type. This way only the interested parties are notified when a certain message is parsed. It's basically a simple messaging system where you need to create a list of parsers, and a dictionary of mappings (message type -> message consumer).

like image 30
Groo Avatar answered Oct 02 '22 14:10

Groo


1. Protocol design

Frankly, it is a common mistake to create a first protocol implementation without any further changes in mind. Just as an exercise, let's try to design flexible protocol.

enter image description here

Basically, the idea is to have several frames encapsulated into each other. Please note, you have Payload ID available, so it is easy to identify next frame in the sequence.

You can use Wireshark in order to see real-life protocols usually follow the same principle.

enter image description here

Such approach simplifies packet dissection a lot, but it is still possible to deal with other protocols.

2. Protocol decoding(dissection)

I spent quite a lot of time developing next generation network analyzer for my previous company.

enter image description here

Can't expose all the details, but one of the key features was flexible protocol stack, capable of identifying protocol frames. RTP is a good example, because there is no hint on the lower layer (usually UDP) next frame is a RTP frame. Special VM was developed to execute dissectors and control the process.

The good news I have smaller personal projects with Java-based dissectors (I'll skip some javadocs in order to save several lines).

/**
 * High-level dissector contract definition. Dissector is meant to be a simple
 * protocol decoder, which analyzes protocol binary image and produces number
 * of fields.
 *
 * @author Renat.Gilmanov
 */
public interface Dissector {

    /**
     * Returns dissector type.
     */
    DissectorType getType();

    /**
     * Verifies packet data belongs to the protocol represented by this dissector.
     */
    boolean isProtocol(DataInput input, Dissection dissection);

    /**
     * Performs the dissection.
     */
    Dissection dissect(DataInput input, Dissection dissection);

    /**
     * Returns a protocol which corresponds to the current dissector.
     *
     * @return a protocol instance
     */
    Protocol getProtocol();
}

Protocol itself knows upper layer protocol, so when there is no direct hint available it is possible to iterate through known protocols and use isProtocol method in order to identify the next frame.

public interface Protocol {

    // ...

    List<Protocol> getUpperProtocols(); }

As I said RTP protocol is a bit tricky to handle:

enter image description here

So let's check implementation details. Verification is based on several known facts about the protocol:

/**
 * Verifies current frame belongs to RTP protocol.
 *
 * @param input data input
 * @param dissection initial dissection
 * @return true if protocol frame is RTP
 */
@Override
public final boolean isProtocol(final DataInput input, final Dissection dissection) {
    int available = input.available();
    byte octet    = input.getByte();
    byte version  = getVersion(octet);
    byte octet2   = input.getByte(1);
    byte pt       = (byte) (octet2 & 0x7F);

    return ((pt < 0x47) & (RTP_VERSION == version));
}

Dissection is just a set of basic operations:

public final Dissection dissect(DataInput input, Dissection d) {

    // --- protocol header --------------------------------
    final byte octet1 = input.getByte(0);
    final byte version = getVersion(octet1);
    final byte p = (byte) ((octet1 & 0x20) >> 5);
    final byte x = (byte) ((octet1 & 0x10) >> 4);
    final byte cc = (byte) ((octet1 & 0x0F));

    //...

    // --- seq --------------------------------------------
    final int seq = (input.getInt() & 0x0000FFFF);
    final int timestamp = input.getInt();
    final int ssrc = input.getInt();

Finally you can define a Protocol stack:

public interface ProtocolStack {

    String getName();

    Protocol getRootProtocol();

    Dissection dissect(DataInput input, Dissection dissection, DissectOptions options);
}

Under the hood it handles all the complexity and decodes a packet, frame by frame. The biggest challenge is to make the dissection process bullet-proof and stable. Using such or similar approach you'll be able to organize you protocol decoding code. It is likely proper implementation of isProtocol will let you handle different version and so on. Anyway, I would not say this approach is simple, but it provides a lot of flexibility and control.

3. Is there any universal solution?

Yes, there is ASN.1:

Abstract Syntax Notation One (ASN.1) is a standard and notation that describes rules and structures for representing, encoding, transmitting, and decoding data in telecommunications and computer networking. The formal rules enable representation of objects that are independent of machine-specific encoding techniques. Formal notation makes it possible to automate the task of validating whether a specific instance of data representation abides by the specifications. In other words, software tools can be used for the validation.

Here is an example of a protocol defined using ASN.1:

FooProtocol DEFINITIONS ::= BEGIN

    FooQuestion ::= SEQUENCE {
        trackingNumber INTEGER,
        question       IA5String
    }

    FooAnswer ::= SEQUENCE {
        questionNumber INTEGER,
        answer         BOOLEAN
    }

END

BTW, there is Java Asn.1 Compiler available:

JAC (Java Asn1 Compiler) is a tool for you if you want to (1)parse your asn1 file (2)create .java classes and (3)encode/decode instances of your classes. Just forget all asn1 byte streams, and take the advantage of OOP! BER, CER and DER are all supported.

Finally

I usually recommend to do several simple PoCs in order to find best possible solution. I decided not to use ASN.1 in order to reduce complexity and have some room for optimization, but it might help you.

Anyway, try everything you can and let us know about the results :)

You can also check the following topic: Efficient decoding of binary and text structures (packets)

4. Update: bidirectional approach

I'm sorry for quite a long answer. I just want you to have enough options to find best possible solution. Answering the question regarding bidirectional approach:

  • Option 1: you can use symmetrical serialization approach: define DataOutput, write serialization logic -- you are done. I will just recommend to looks through BerkeleyDB API and TupleBinding. It does solve the same problem providing a full control on store/restore procedure.

This class takes care of converting the entries to/from TupleInput and TupleOutput objects. Its two abstract methods must be implemented by a concrete subclass to convert between tuples and key or data objects.

entryToObject(TupleInput)
objectToEntry(Object,TupleOutput)
  • Option 2: The most universal way is to define your structure containing a set of fields. Every field requires the following information:
    • name
    • type
    • size (bits)

For example, for RTP it will look like the following:

Version:          byte (2 bits)
Padding:          bool (1 bit)
Extension:        bool (1 bit)
CSRC Count:       byte (4 bits) 
Marker:           bool (1 bit)
Payload Type:     byte (7 bits)
Sequence Number:  int  (16 bits)

Having that you can define generic way of reading/writing such structures. Closest working example I know is Javolution Struct. Please look through, they have a really good examples:

class Clock extends Struct { // Hardware clock mapped to memory.
     Unsigned16 seconds  = new Unsigned16(5); // unsigned short seconds:5 bits
     Unsigned16 minutes  = new Unsigned16(5); // unsigned short minutes:5 bits
     Unsigned16 hours    = new Unsigned16(4); // unsigned short hours:4 bits
     ...
 }
like image 62
Renat Gilmanov Avatar answered Oct 02 '22 14:10

Renat Gilmanov