how does axios handle blob vs arraybuffer as responseType?

Tags:

I'm downloading a zip file with axios. For further processing, I need to get the "raw" data that has been downloaded. As far as I can see, in Javascript there are two types for this: Blobs and Arraybuffers. Both can be specified as responseType in the request options.

In a next step, the zip file needs to be uncompressed. I've tried two libraries for this: js-zip and adm-zip. Both want the data to be an ArrayBuffer. So far so good, I can convert the blob to a buffer. And after this conversion adm-zip always happily extracts the zip file. However, js-zip complains about a corrupted file, unless the zip has been downloaded with 'arraybuffer' as the axios responseType. js-zip does not work on a buffer that has been taken from a blob.

This was very confusing to me. I thought both ArrayBuffer and Blob are essentially just views on the underlying memory. There might be a difference in performance between downloading something as a blob vs buffer. But the resulting data should be the same, right ?

Well, I decided to experiment and found this:

If you specify responseType: 'blob', axios converts the response.data to a string. Let's say you hash this string and get hashcode A. Then you convert it to a buffer. For this conversion, you need to specify an encoding. Depending on the encoding, you will get a variety of new hashes, let's call them B1, B2, B3, ... When specifying 'utf8' as the encoding, I get back to the original hash A.

So I guess when downloading data as a 'blob', axios implicitly converts it to a string encoded with utf8. This seems very reasonable.

Now you specify responseType: 'arraybuffer'. Axios provides you with a buffer as response.data. Hash the buffer and you get a hashcode C. This code does not correspond to any code in A, B1, B2, ...

So when downloading data as an 'arraybuffer', you get entirely different data?

It now makes sense to me that the unzipping library js-zip complains if the data is downloaded as a 'blob'. It probably actually is corrupted somehow. But then how is adm-zip able to extract it? And I checked the extracted data, it is correct. This might only be the case for this specific zip archive, but nevertheless surprises me.

Here is the sample code I used for my experiments:

Click to copy

//typescript import syntax, this is executed in nodejs import axios from 'axios'; import * as crypto from 'crypto';  axios.get(     "http://localhost:5000/folder.zip", //hosted with serve     { responseType: 'blob' }) // replace this with 'arraybuffer' and response.data will be a buffer     .then((response) => {         console.log(typeof (response.data));          // first hash the response itself         console.log(crypto.createHash('md5').update(response.data).digest('hex'));          // then convert to a buffer and hash again         // replace 'binary' with any valid encoding name         let buffer = Buffer.from(response.data, 'binary');         console.log(crypto.createHash('md5').update(buffer).digest('hex'));         //...

What creates the difference here, and how do I get the 'true' downloaded data?

610

asked Feb 28 '20 14:02

lhk

1 Answers

From axios docs:

Click to copy

// `responseType` indicates the type of data that the server will respond with // options are: 'arraybuffer', 'document', 'json', 'text', 'stream' //   browser only: 'blob' responseType: 'json', // default

`'blob'` is a "browser only" option.

So from node.js, when you set responseType: "blob", "json"will actually be used, which I guess fallbacks to "text" when no parse-able JSON data has been fetched.

Fetching binary data as text is prone to generate corrupted data. Because the text returned by Body.text() and many other APIs are USVStrings (they don't allow unpaired surrogate codepoints ) and because the response is decoded as UTF-8, some bytes from the binary file can't be mapped to characters correctly and will thus be replaced by � (U+FFDD) replacement character, with no way to get back what that data was before: your data is corrupted.

Here is a snippet explaining this, using the header of a .png file 0x89 0x50 0x4E 0x47 as an example.

Click to copy

(async () => {    const url = 'https://upload.wikimedia.org/wikipedia/commons/4/47/PNG_transparency_demonstration_1.png';   // fetch as binary   const buffer = await fetch( url ).then(resp => resp.arrayBuffer());    const header = new Uint8Array( buffer ).slice( 0, 4 );   console.log( 'binary header', header ); // [ 137, 80, 78, 61 ]   console.log( 'entity encoded', entityEncode( header ) );   // [ "U+0089", "U+0050", "U+004E", "U+0047" ]   // You can read more about  (U+0089) character here   // https://www.fileformat.info/info/unicode/char/0089/index.htm   // You can see in the left table how this character in UTF-8 needs two bytes (0xC2 0x89)   // We thus can't map this character correctly in UTF-8 from the UTF-16 codePoint,   // it will get discarded by the parser and converted to the replacement character      // read as UTF-8    const utf8_str = await new Blob( [ header ] ).text();   console.log( 'read as UTF-8', utf8_str ); // "�PNG"   // build back a binary array from that string   const utf8_binary = [ ...utf8_str ].map( char => char.charCodeAt( 0 ) );   console.log( 'Which is binary', utf8_binary ); // [ 65533, 80, 78, 61 ]   console.log( 'entity encoded', entityEncode( utf8_binary ) );   // [ "U+FFDD", "U+0050", "U+004E", "U+0047" ]   // You can read more about character � (U+FFDD) here   // https://www.fileformat.info/info/unicode/char/0fffd/index.htm   //   // P (U+0050), N (U+004E) and G (U+0047) characters are compatible between UTF-8 and UTF-16   // For these there is no encoding lost   // (that's how base64 encoding makes it possible to send binary data as text)      // now let's see what fetching as text holds   const fetched_as_text = await fetch( url ).then( resp => resp.text() );   const header_as_text = fetched_as_text.slice( 0, 4 );   console.log( 'fetched as "text"', header_as_text ); // "�PNG"   const as_text_binary = [ ...header_as_text ].map( char => char.charCodeAt( 0 ) );   console.log( 'Which is binary', as_text_binary ); // [ 65533, 80, 78, 61 ]   console.log( 'entity encoded', entityEncode( as_text_binary ) );   // [ "U+FFDD", "U+0050", "U+004E", "U+0047" ]   // It's been read as UTF-8, we lost the first byte.    })();  function entityEncode( arr ) {   return Array.from( arr ).map( val => 'U+' + toHex( val ) ); } function toHex( num ) {   return num.toString( 16 ).padStart(4, '0').toUpperCase(); }

There is natively no Blob object in node.js, so it makes sense axios didn't monkey-patch it just so they can return a response no-one else would be able to consume anyway.

From a browser, you'd have exactly the same responses:

Click to copy

function fetchAs( type ) {   return axios( {     method: 'get',     url: 'https://upload.wikimedia.org/wikipedia/commons/4/47/PNG_transparency_demonstration_1.png',     responseType: type   } ); }  function loadImage( data, type ) {   // we can all pass them to the Blob constructor directly   const new_blob = new Blob( [ data ], { type: 'image/jpg' } );   // with blob: URI, the browser will try to load 'data' as-is   const url = URL.createObjectURL( new_blob );      img = document.getElementById( type + '_img' );   img.src = url;   return new Promise( (res, rej) => {      img.onload = e => res(img);     img.onerror = rej;   } ); }  [   'json', // will fail   'text', // will fail   'arraybuffer',   'blob' ].forEach( type =>   fetchAs( type )    .then( resp => loadImage( resp.data, type ) )    .then( img => console.log( type, 'loaded' ) )    .catch( err => console.error( type, 'failed' ) ) );

Click to copy

<script src="https://unpkg.com/axios/dist/axios.min.js"></script>  <figure>   <figcaption>json</figcaption>   <img id="json_img"> </figure> <figure>   <figcaption>text</figcaption>   <img id="text_img"> </figure> <figure>   <figcaption>arraybuffer</figcaption>   <img id="arraybuffer_img"> </figure> <figure>   <figcaption>blob</figcaption>   <img id="blob_img"> </figure>

155

answered Sep 18 '22 20:09

Kaiido

Related questions
                            
                                Get the offset position of the caret in a textarea in pixels [duplicate]
                            
                                How to generate thumbnail images of HTML pages
                            
                                How to set default child view with Angular UI Router
                            
                                Updating time offset with moment().utcOffset()
                            
                                How to get value of textbox in React?
                            
                                Why do I have to .bind(this) for methods defined in React component class, but not in regular ES6 class
                            
                                Use functions defined in ES6 module directly in html
                            
                                html5 canvas general performance tips
                            
                                Will a script continue to run even after closing a page?
                            
                                How to embed V8 in a Java application?
                            
                                How Google Voice Search works? Is there an API for that?
                            
                                In Javascript, can I use a variable before it is declared?
                            
                                Yeoman: Call Sub-Generator With User-Supplied Arguments
                            
                                Wait for document ready in ES6 modules
                            
                                What are JavaScript Data Types?
                            
                                Does the ORDER of javascript files matter, when they are all combined into one file?
                            
                                Generate animated GIF with HTML5 canvas [closed]
                            
                                Handle form errors using components Angular - TypeScript
                            
                                JavaScript validation of multiple Booleans [duplicate]
                            
                                What is the purpose of template literals (backticks) following a function in ES6?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how does axios handle blob vs arraybuffer as responseType?

Tags:

javascript

node.js

blob

axios

arraybuffer

lhk

People also ask

1 Answers

`'blob'` is a "browser only" option.

Kaiido

Recent Activity

Donate For Us

how does axios handle blob vs arraybuffer as responseType?

Tags:

javascript

node.js

blob

axios

arraybuffer

lhk

People also ask

1 Answers

'blob' is a "browser only" option.

Kaiido

Related questions

Recent Activity

Donate For Us

`'blob'` is a "browser only" option.