Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way (time and space wise) to send binary data in response

My setup is a Flask-based server. A bird-view of the project would be: the Flask-based server fetches binary data from AWS S3 based on some algorithmic calculations (like figuring out the filenames to fetch from S3), and serves the data to an HTML+JavaScript client.

At first, I thought a JSON object to be the best response type. I created a JSON response with following (possibly syntactically incorrect) format:

{
  'payload': [
    {
      'symbol': 'sym',
      'exchange': 'exch',
      'headerfile': {
        'name': '#name',
        'content': '#binarycontent'
      },
      'datafiles': [
        {
          'name': '#name',
          'content': '#binarycontent'
        },
        {
          'name': '#name',
          'content': '#binarycontent'
        }
      ]
    },
    'errors': [ //errors ]
}

I apologise for any syntactical errors in the JSON; I am a bit sleepy to find out a minor error. After structuring this JSON, I came to know that JSON doesn't natively support binary data in it. So, I wouldn't be able to embed the binary data as values in JSON.

I realize that I can always convert the bytes into base64-encoded string, and use the string as value in JSON. But, a resultant string is around 30% extra in size; 4010 bytes of data was encoded into 5348 bytes, which while insignificant for a single binary chunk, is seen as a concern by my client when it comes to embedding a lot of such binary chunks in a JSON response. Due to the extra size, response would take more time to reach the client, which is a crucial concern for my client's application.

Another option I considered was to stream the binary chunks as octet-stream Content-Type to the client. But I am not sure if its any better than the above solution. Futhermore, I haven't been able to figure out how to relate the binary chunks and their names in such a situation.

Is there a solution better than 'convert binary to text and embed into JSON'?

like image 967
Jayesh Bhoot Avatar asked Mar 19 '14 19:03

Jayesh Bhoot


People also ask

How do you send binary data?

Sending binary dataThe send method of the XMLHttpRequest has been extended to enable easy transmission of binary data by accepting an ArrayBuffer , Blob , or File object. The following example creates a text file on-the-fly and uses the POST method to send the "file" to the server.

Which is the best way to download binary data in JSON?

The JSON format natively doesn't support binary data. The binary data has to be escaped so that it can be placed into a string element (i.e. zero or more Unicode chars in double quotes using backslash escapes) in JSON. An obvious method to escape binary data is to use Base64.

Can you send binary over HTTP?

HTTP uses byte streams by default, so any data that is transferred can be binary data.


1 Answers

I solved the problem, and will write down the solution hoping it could save someone else's time.

Thank you, @dstromberg and @LukasGraf for your advices. I checked out BSON first, and found it sufficient for my needs, so never went into details of Procotol Buffer.

BSON on PyPi is available into two packages. In pymongo, it comes as a supplement to MongoDB. In bson, it is a standalone package, obviously suiting to my needs. However, it supports only Python2. So I looked around for a Python3 implementation before rolling out my own port, and found another implementation of BSON spec on bsonspec.org: Link to the module.

The simplest usage of that module goes like this:

>>> import bson
warning: module typecheck.py cannot be imported, type checking is skipped
>>> encoded = bson.serialize_to_bytes({'name': 'chunkfile', 'content': b'\xad\x03\xae\x03\xac\x03\xac\x03\xd4\x13'})
>>> print(encoded)
b'1\x00\x00\x00\x02name\x00\n\x00\x00\x00chunkfile\x00\x05content\x00\n\x00\x00\x00\x00\xad\x03\xae\x03\xac\x03\xac\x03\xd4\x13\x00'
>>> decoded = bson.parse_bytes(encoded)
>>> print(decoded)
OrderedDict([('name', 'chunkfile'), ('content', b'\xad\x03\xae\x03\xac\x03\xac\x03\xd4\x13')])

As you can see, it can accommodate binary data as well. I sent the data from Flask as mimetype=application/bson, which was accurately parsed by the receiving JavaScript using this standalone BSON library provided by MongoDB team.

like image 172
Jayesh Bhoot Avatar answered Oct 21 '22 18:10

Jayesh Bhoot