Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient decoding of binary and text structures (packets)

Background

There is a well-known tool called Wireshark. I've been using it for ages. It is great, but performance is the problem. Common usage scenario includes several data preparation steps in order to extract a data subset to be analyzed later. Without that step it takes minutes to do filtering (with big traces Wireshark is next to unusable).

enter image description here

The actual idea is to create a better solution, fast, parallel and efficient, to be used as a data aggregator/storage.

Requirements

The actual requirement is to use all power provided by modern hardware. I should say there is a room for different types of optimization and I hope I did a good job on upper layers, but technology is the main question right now. According to the current design there are several flavors of packet decoders (dissectors):

  • interactive decoders: decoding logic can be easily changed in runtime. Such approach can be quite useful for protocol developers -- decoding speed is not that critical, but flexibility and fast results are more important
  • embeddable decoders: can be used as a library.This type is supposed to have good performance and be flexible enough to use all available CPUs and cores
  • decoders as a service: can be accessed through a clean API. This type should provide best of the breed performance and efficiency

Results

My current solution is JVM-based decoders. The actual idea is to reuse the code, eliminate porting, etc, but still have good efficiency.

  • Interactive decoders: implemented on Groovy
  • Embeddable decoders: implemented on Java
  • Decoders as a service: Tomcat + optimizations + embeddable decoders wrapped into a servlet (binary in, XML out)

Problems to be solved

  • Groovy provides way to much power and everything, but lucks expressiveness in this particular case
  • Decoding protocol into a tree structure is a dead end -- too many resources are simply wasted
  • Memory consumption is somewhat hard to control. I did several optimizations but still not happy with profiling results
  • Tomcat with various bells and whistles still introduces to much overhead (mainly connection handling)

Am I doing right using JVM everywhere? Do you see any other good and elegant way to achieve the initial goal: get easy-to-write highly scalable and efficient protocol decoders?

The protocol, format of the results, etc are not fixed.

like image 660
Renat Gilmanov Avatar asked Nov 30 '12 16:11

Renat Gilmanov


3 Answers

I've found several possible improvements:

Interactive decoders

Groovy expressiveness can be greatly improved, by extending Groovy syntax using AST Transformations. So it would be possible to simplify decoders authoring still providing good performance. AST (stands for Abstract Syntax Tree) is a compile-time technique.

When the Groovy compiler compiles Groovy scripts and classes, at some point in the process, the source code will end up being represented in memory in the form of a Concrete Syntax Tree, then transformed into an Abstract Syntax Tree. The purpose of AST Transformations is to let developers hook into the compilation process to be able to modify the AST before it is turned into bytecode that will be run by the JVM.

I do not want to reinvent the wheel introducing yet another language to define/describe a protocol structure (it is enough to have ASN.1). The idea is to simplify decoders development in order to provide some fast prototyping technique. Basically, some kind of DSL is to be introduced.

Further reading

Embeddable decoders

Java can introduce some additional overhead. There are several libraries to address that issue:

  • HPPC
  • Trove
  • Javolution
  • Commons-primitives

enter image description here

Frankly speaking I do not see any other option except Java for this layer.

Decoders as a service

No Java is needed on this layer. Finally I have a good option to go but price is quite high. GWan looks really good.

enter image description here

Some additional porting will be required, but it is definitely worth it.

like image 179
Renat Gilmanov Avatar answered Sep 21 '22 02:09

Renat Gilmanov


This problem seems to share the same characteristic of many high-performance I/O implementation problems, which is that the number of memory copies dominates performance. The scatter-gather interface patterns for asynchronous I/O follow from this principle. With scatter-gather, blocks of memory are operated on in place. As long as the protocol decoders take block streams as input rather than byte streams, you'll have eliminated a lot of the performance overhead of moving memory around to preserve the byte stream abstraction. The byte stream is a very good abstraction for saving engineering time, but not so good for high-performance I/O.

In a related issue, I'd beware of the JVM just because of the basic type String. I can't say I'm familiar with how String is implemented in the JVM, but I do imagine that there's not a way of making a string out of a block list without doing a memory copy. On the other hand, a native kind of string that could, and which interoperated with the JVM String compatibly could be a way of splitting the difference.


The other aspect of this problem that seems relevant is that of formal languages. In the spirit of not copying blocks of memory, you also don't want to be scanning the same block of memory over and over. Since you want to make run-time changes, that means you probably don't want to use a precompiled state machine, but rather a recursive descent parser that can dispatch to an appropriate protocol interpreter at each level of descent. There are some complications involved when an outer layer does not specify the type of an inner layer. Those complications are worse when you don't even get the length of the inner content, since then you're relying on the inner content to be well formed to prevent runaway. Nevertheless, it's worth putting some attention into understand how many times a single block will be scanned.
like image 30
eh9 Avatar answered Sep 23 '22 02:09

eh9


Network traffic is growing (some analytics), so there will be a need to process more and more data per second.

The only way to achieve that is to use more CPU power, but CPU frequency is stable. Only number of cores is growing. It looks like the only way is to use available cores more efficiently and scale better.

enter image description here

like image 39
Maryan Mandziuk Avatar answered Sep 23 '22 02:09

Maryan Mandziuk