Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient format for transferring data to and from embedded devices

I'm having hard time to choose the format on which my server and my end points will communicate with.
I am considering:

  • JSON
  • YAML Too hard to parse
  • CSV
  • Google Protobufs
  • Binary packing/unpacking (with no use of casting/memset/memcpy to enable portability)
  • Some form of DSL
  • Any other suggestion you might have

My criterias are ordered from the most important to the least:

  1. Which is the easiest to parse?
  2. Which is the fastest to parse?
  3. Which has the smallest in bytes?
  4. Which has the potential to have the most readable messages?
  5. Which has the potential to be encrypted more easily?
  6. Which has the potential to be compressed more easily?

EDIT to clarify:

  • Are the data transfers bi-directional? Yes.
  • What is the physical transport? Ethernet.
  • Is the data formatted as packets or streams? Both but usually packets.
  • How much RAM do the end-points have? The smallest amount possible, depeands on the format I choose.
  • How big are your data? As big as it needs to be. I won't receive huge datasets though.
  • Does the end-point have an RTOS? No.
like image 618
the_drow Avatar asked Aug 02 '10 06:08

the_drow


2 Answers

Key factors are:

  • what capabilities have your clients? (e.g. Can you pick an XML parser from the shelf - without ruling out most of them because of performance reasons? Can you compress the packets on the fly?)
  • What is the complexity of your data ("flat" or deeply structured?)
  • Do you need high-frequency updates? Partial updates?

In my experience:

A simple text protocol (which would categorize itself as DSL) with an interface of

string RunCommand(string commandAndParams)
// e.g. RunCommand("version") returns "1.23"

makes many aspects easier: debugging, logging and tracing, extension of protocol, etc. Having a simple terminal / console for the device is invaluable in tracking down problems, running tests etc.

Let's discuss the limitation in detail, as a point of reference for the other formats:

  • The client needs to run a micro parser. That's not as complex as it might sound (the core of my "micro parser library" is 10 functions with about 200 lines of code total), but basic string processing should be possible
  • A badly written parser is a big attack surface. If the devices are critical/sensitive, or are expected to run in a hostile environment, implementation requires utmost care. (that's true for other protocols, too, but a quickly hacked text parser is easy to get wrong)
  • Overhead. Can be limited by a mixed text/binary protocol, or base64 (which has an overhead of 37%).
  • Latency. With typical network latency, you will not want many small commands issued, some way of batching requests and their returns helps.
  • Encoding. If you have to transfer strings that aren't representable in ASCII, and can't use something like UTF-8 for that on both ends, the advantage of a text-based protocol drops rapidly.

I'd use a binary protocol only if requried by the device, device processing capabilities are insanely low (say, USB controllers with 256 bytes of RAM), or your bandwidth is severely limited. Most of the protocols I've worked with use that, and it's a pain.

Google protBuf is an approach to make a binary protocol somewhat easier. A good choice if you can run the libraries on both ends, and have enough freedom to define the format.

CSV is a way to pack a lot of data into an easily parsed format, so that's an extension of the text format. It's very limited in structure, though. I'd use that only if you know your data fits.

XML/YAML/... I'd use only if processing power isn't an issue, bandwith either isn't an issue or you can compress on the fly, and the data has a very complex structure. JSON seems to be a little lighter on overhead and parser requirements, might be a good compromise.

like image 121
peterchen Avatar answered Sep 29 '22 10:09

peterchen


Usually in these cases it pays to customize the data format for the device. For example depending on the restrictions you face in terms of network or storage size, you can go for streaming compression or prefer full compression. Also the type of data you want to store is a big factor.

If really your biggest problem is ease of parsing you should go for xml, but on an embedded device ease of parsing is usually much less of a concern compared to transfer speed, storage size and cpu consumption. JSON and YAML, much like XML are primarily focussed on parsing ease first and foremost. Protobuf might squeeze in there, binary packing is what people usually do. Encryption and compression you should rather do on the transport level, although functionally you should aim to put as little information as possible in a message.

I know I'm not giving you a clear cut answer, but I think there is no such thing to such a generic question.

like image 23
iwein Avatar answered Sep 29 '22 10:09

iwein