Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

binary protocols v. text protocols

does anyone have a good definition for what a binary protocol is? and what is a text protocol actually? how do these compare to each other in terms of bits sent on the wire?

here's what wikipedia says about binary protocols:

A binary protocol is a protocol which is intended or expected to be read by a machine rather than a human being (http://en.wikipedia.org/wiki/Binary_protocol)

oh come on!

to be more clear, if I have jpg file how would that be sent through a binary protocol and how through a text one? in terms of bits/bytes sent on the wire of course.

at the end of the day if you look at a string it is itself an array of bytes so the distinction between the 2 protocols should rest on what actual data is being sent on the wire. in other words, on how the initial data (jpg file) is encoded before being sent.

like image 508
der_grosse Avatar asked Apr 15 '10 12:04

der_grosse


People also ask

What are binary protocols?

A binary protocol utilizes all values of a byte, as opposed to a text-based protocol which only uses values corresponding to human-readable characters in ASCII encoding. Binary protocols are intended to be read by a machine rather than a human being.

Is binary more efficient than text?

Text protocols are better in terms of readability, ease of reimplementing, and ease of debugging. Binary protocols are more compact. However, you can compress your text using a library like LZO or Zlib, and this is almost as compact as binary (with very little performance hit for compression/decompression.)

Is text a binary or HTTP?

The HTTP protocol itself is readable as text. This is useful because you can telnet into any server at all and communicate with it. Being text also allows you to easily watch HTTP communication with a program like wireshark.

Which protocol is a text based protocol?

IRC – Internet Relay Chat (IRC) is a text-based communication protocol.


2 Answers

Binary protocol versus text protocol isn't really about how binary blobs are encoded. The difference is really whether the protocol is oriented around data structures or around text strings. Let me give an example: HTTP. HTTP is a text protocol, even though when it sends a jpeg image, it just sends the raw bytes, not a text encoding of them.

But what makes HTTP a text protocol is that the exchange to get the jpg looks like this:

Request:

GET /files/image.jpg HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.01 [en] (Win95; I) Host: hal.etc.com.au Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */* Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 

Response:

HTTP/1.1 200 OK Date: Mon, 19 Jan 1998 03:52:51 GMT Server: Apache/1.2.4 Last-Modified: Wed, 08 Oct 1997 04:15:24 GMT ETag: "61a85-17c3-343b08dc" Content-Length: 60830 Accept-Ranges: bytes Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: image/jpeg  <binary data goes here> 

Note that this could very easily have been packed much more tightly into a structure that would look (in C) something like

Request:

struct request {   int requestType;   int protocolVersion;   char path[1024];   char user_agent[1024];   char host[1024];   long int accept_bitmask;   long int language_bitmask;   long int charset_bitmask; }; 

Response:

struct response {   int responseType;   int protocolVersion;   time_t date;   char host[1024];   time_t modification_date;   char etag[1024];   size_t content_length;   int keepalive_timeout;   int keepalive_max;   int connection_type;   char content_type[1024];   char data[]; }; 

Where the field names would not have to be transmitted at all, and where, for example, the responseType in the response structure is an int with the value 200 instead of three characters '2' '0' '0'. That's what a text based protocol is: one that is designed to be communicated as a flat stream of (usually human-readable) lines of text, rather than as structured data of many different types.

like image 50
Tyler McHenry Avatar answered Nov 16 '22 00:11

Tyler McHenry


Here's a kind-of cop-out definition:

You'll know it when you see it.

This is one of those cases where it is very hard to find a concise definition that covers all corner cases. But it is also one of those cases where the corner cases are completely irrelevant, because they simply do not occur in real life.

Pretty much all protocols that you will encounter in real life will either look like this:

> fg,m4wr76389b zhjsfg gsidf7t5e89wriuotu nbsdfgizs89567sfghlkf >  b9er t8ß03q+459tw4t3490ß´5´3w459t srt üßodfasdfäasefsadfaüdfzjhzuk78987342 < mvclkdsfu93q45324äö53q4lötüpq34tasä#etr0 awe+s byf eart 

[Imagine a ton of other non-printable crap there. One of the challenges in conveying the difference between text and binary is that you have to do the conveying in text :-)]

Or like this:

< HELLO server.example.com > HELLO client.example.com < GO > GETFILE /foo.jpg < Length: 3726 < Type: image/jpeg < READY? > GO < ... server sends 3726 bytes of binary data ... > ACK > BYE 

[I just made this up on the spot.]

There's simply not that much ambiguity there.

Another definition that I have sometimes heard is

a text protocol is one that you can debug using telnet

Maybe I am showing my nerdiness here, but I have actually written and read e-mails via SMTP and POP3, read usenet articles via NNTP and viewed web pages via HTTP using telnet, for no other reason than to see whether it would actually work.

Actually, while writing this, I kinda caught the fever again:

bash-4.0$ telnet smtp.googlemail.com 25 Trying 74.125.77.16... Connected to googlemail-smtp.l.google.com. Escape character is '^]'. < 220 googlemail-smtp.l.google.com ESMTP Thu, 15 Apr 2010 19:19:39 +0200 > HELO < 501 Syntactically invalid HELO argument(s) > HELO client.example.com < 250 googlemail-smtp.l.google.com Hello client.example.com [666.666.666.666] > RCPT TO:Me <[email protected]> < 503 sender not yet given > SENDER:Me <[email protected]> < 500 unrecognized command > RCPT FROM:Me <[email protected]> < 500 unrecognized command > FROM:Me <[email protected]> < 500-unrecognized command > HELP < 214-Commands supported: < 214 AUTH HELO EHLO MAIL RCPT DATA NOOP QUIT RSET HELP ETRN > MAIL FROM:Me <[email protected]> < 250 OK > RCPT TO:You <[email protected]> < 250 Accepted > DATA < 354 Enter message, ending with "." on a line by itself > From: Me <[email protected]> > To: You <[email protected]> > Subject: Testmail > > This is a test. > . < 250 OK id=1O2Sjq-0000c4-Qv > QUIT < 221 googlemail-smtp.l.google.com closing connection Connection closed by foreign host. 

Damn, it's been quite a while since I've done this. Quite a few errors in there :-)

like image 41
Jörg W Mittag Avatar answered Nov 16 '22 00:11

Jörg W Mittag