Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing binary data from Javascript, Ajax, IE: can responseBody be read from Javascript (not VB)?

First of all, I am aware of this question:

  • How do I load binary image data using Javascript and XMLHttpRequest?

and specifically best answer therein, http://emilsblog.lerch.org/2009/07/javascript-hacks-using-xhr-to-load.html.

So accessing binary data from Javascript using Firefox (and later versions of Chrome which actually seem to work too; don't know about Opera). So far so good. But I am still hoping to find a way to access binary data with a modern IE (ideally IE 6, but at least IE 7+), without using VB. It has been mentioned that XHR.messageBody would not work (if it contains zero bytes), but I was wondering if this might have been resolved with newer versions; or if there might be alternate settings that would allow simple binary data access.

Specific use case for me is that of accessing data returned by a web service that is encoded using a binary data transfer format (including byte combinations that are not legal in UTF-8 encoding).

like image 203
StaxMan Avatar asked Sep 13 '10 21:09

StaxMan


2 Answers

It's possible with IE10, using responseType=arraybuffer or blob. You only had to wait for a few years...

http://msdn.microsoft.com/en-us/library/ie/br212474%28v=vs.94%29.aspx

http://msdn.microsoft.com/en-us/library/ie/hh673569%28v=vs.85%29.aspx

like image 72
Damien Avatar answered Oct 17 '22 12:10

Damien


Ok, I have found some interesting leads, although not completely good solution yet.

One obvious thing I tried was to play with encodings. There are 2 obvious things that really should work:

  • Latin-1 (aka ISO-8859-1): it is single-byte encoding, mapping one-to-one with Unicode. So theoretically it should be enough to declare content type of "text/plain; charset=ISO-8859-1" and get character-per-byte. Alas, due to idiotic logics of browsers (and even more idiotic mandate by HTML 5!), there is some transcoding occuring which changes high control character range (codes 128 - 159) in strange ways. Apparently this is due to mandatory assumption that encoding really is Windows-1252 (why? For some silly reasons.. but it is what it is)
  • UCS-2 is a fixed-length 2-byte encoding that predated UTF-17; and simply splits 16-bit character codes into 2 bytes. Alas, browsers do not seem to support it.
  • UTF-16 might work, theoretically, but there is the problem of surrogate pair characters (0xD800 - 0xDFFF) which are reserved. And if byte pairs that encode these characters are included, corruption occurs.

However: it seems to conversion for Latin-1 might be reversible, and if so, I bet I could make use of it after all. All mutations are from 1 byte (0x00 - 0xFF) into larger-than-byte values, and there are no ambiguous mappings at least for Firefox. If this holds true for other browsers, it will be possible to map values back and remove ill effects of automatic transcoding. And that would then work for multiple browsers, including IE (with the caveat of needing something special to deal with null values).

Finally, some useful links for conversions of datatypes are:

  • http://www.merlyn.demon.co.uk/js-exact.htm#IEEE (to handle floating points to/from binary IEEE representation)
  • http://jsfromhell.com/classes/binary-parser (for general parsing)
like image 39
StaxMan Avatar answered Oct 17 '22 12:10

StaxMan