Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing and sending raw file data within a JSON object

I'm looking for a way to transfer the raw file data of any file-type with any possible content (By that I mean files and file-content are all user generated) both ways using xhr/ajax calls in a Backbone front-end against a Django back-end.


EDIT: Maybe the question is still unclear...

If you open a file in an IDE (such as Sublime), you can view and edit the actual code that comprises that file. I'm trying to put THAT raw content into a JSON so I can send to the browser, it can be modified, and then sent back.

I posted this question because I was under the impression that because the contents of these files can effectively be in ANY coding language that just stringify-ing the contents and sending it seems like a brittle solution that would be easy to break or exploit. Content could contain any number of ', ", { and } chars that would seem to break JSON formatting, and escaping those characters would leave artifacts within the code that would effectively break them (wouldn't it?).

If that assumption is wrong, THAT would also be an acceptable answer (so long as you could point out whatever it is I'm overlooking).


The project I'm working on is a browser-based IDE that will receive a complete file-structure from the server. Users can add/remove files, edit the content of those files, then save their changes back to the server. The sending/receiving all has to be handled via ajax/xhr calls.

  • Within Backbone, each "file" is instantiated as a model and stored in a Collection. The contents of the file would be stored as an attribute on the model.
  • Ideally, file content would still reliably throw all the appropriate events when changes are made.
  • Fetching contents should not be broken out into a separate call from the rest of the file model. I'd like to just use a single save/fetch call for sending/receiving files including the raw content.

Solutions that require Underscore/jQuery are fine, and I am able to bring in additional libraries if there is something available that specializes in managing that raw file data.

like image 727
relic Avatar asked Sep 08 '15 20:09

relic


People also ask

What is JSON raw data?

Raw JSON text is the format Minecraft uses to send and display rich text to players. It can also be sent by players themselves using commands and data packs. Raw JSON text is written in JSON, a human-readable data format.

Can I send image in JSON?

An image is of the type "binary" which is none of those. So you can't directly insert an image into JSON. What you can do is convert the image to a textual representation which can then be used as a normal string. The most common way to achieve that is with what's called base64.


2 Answers

Interesting question. The code required to implement this would be quite involved, sorry that I'm not providing examples, but you seem like a decent programmer and should be able to implement what's mentioned below.

Regarding the sending of raw data through JSON, all you would need to do to make it JSON-safe and not break your code is to escape the special characters by stringyfying using Python's json.dumps & JavaScript's JSON.stringyfy. [1]

If you are concerned about some form of basic tamper-proofing, then light encoding of your data will fit the purpose, in addition to having the client and server pass a per-session token back and forth with JSON transfers to ensure that the JSON isn't forged from a malicious address.

If you want to check the end-to-end integrity of the data, then generate an md5 checksum and send it inside your JSON and then generate another md5 on arrival and compare with the one inside your JSON.

Base64 encoding: The size of your data would grow by 33% as it encodes four characters to represent three bytes of data.

Base85: Encodes four bytes as five characters and will grow your data by 25%, but uses much more processing overhead than Base64 in Python. That's a 8% improvement in data size, but at the expense of processing overhead. Also it's not string safe as double & single quotation marks, angle brackets, and ampersands cannot be used unescaped inside JSON, as it uses all 95 printable ASCII characters. Needs to be stringyfied before JSON transport. [2]

yEnc has as little as 2-3% overhead (depending on the frequency of identical bytes in the data), but is ruled out by impractical flaws (see [3]).

ZeroMQ Base-85, aka Z85. It's a string-safe variant of Base85, with a data overhead of 25%, which is better than Base64. No stringyfying necessary for sticking it into JSON. I highly recommended this encoding algorithm. [4] [5] [6]

If you're sending only small files (say a few KB), then the overhead of binary-to-text conversion will be acceptable. With files as large as a few Mbs, it might not be acceptable to have them grow by 25-33%. In this case you can try to compress them before sending. [7]

You can also send data to the server using multipart/form-data, but I can't see how this will work bi-directionally.

UPDATE

In conclusion, here's my solution's algorithm:

Sending data

  • Generate a session token and store it for the associated user upon login (server), or retrieve from the session cookie (client)

  • Generate MD5 hash for the data for integrity checking during transport.

  • Encode the raw data with Z85 to add some basic tamper-proofing and JSON-friendliness.

  • Place the above inside a JSON and send POST when requested.

Reception

  • Grab JSON from POST

  • Retrieve session token from storage for the associated user (server), or retrieve from the session cookie (client).

  • Generate MD5 hash for the received data and test against MD5 in received JSON, reject or accept conditionally.

  • Z85-decode the data in received JSON to get raw data and store in file or DB (server) or process/display in GUI/IDE (client) as required.


References

[1] How to escape special characters in building a JSON string?

[2] Binary Data in JSON String. Something better than Base64

[3] https://en.wikipedia.org/wiki/YEnc

[4] http://rfc.zeromq.org/spec:32

[5] Z85 implementation in C/C++ https://github.com/artemkin/z85

[6] Z85 Python implementation of https://gist.github.com/minrk/6357188

[7] JavaScript zip library http://stuk.github.io/jszip/

[8] JavaScript Gzip SO JavaScript implementation of Gzip

like image 181
jv-k Avatar answered Oct 18 '22 09:10

jv-k


AFAI am concerned a simple Base64 conversion will do it. Stringify, convert to base64, then pass it to the server and decode it there. Then you won't have the raw file transfer and you will still maintain your code simple.

I know this solution could seem a bit too simple, but think about it: many cryptographics algorithms can be broken given the right hardware. One of the most secure means would be through a digital certificate and then encrypt data with the private key and then send it over to the server. But, to reach this level of security every user of your application would have to have a digital certificate, which I think would be an excessive demand to your users.

So ask yourself, if implementing a really safe solution adds a lot of hassle to your users, why do you need a safe transfer at all? Based on that I reaffirm what I said before. A simple Base64 conversion will do. You can also use some other algotithms like SHA256 ou something to make it a litter bit safer.

like image 44
Nelson Teixeira Avatar answered Oct 18 '22 09:10

Nelson Teixeira