Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JavaScript BLOB coming out much larger than the input

Tags:

javascript

We receive a local file (typically a PDF, PNG or JPG) by drag and drop in a variable (using dropzone.js - at this stage it's base64 plus the characters to specify the file type). We encrypt it (now it's binary) into a javascript variable. We then create a Blob using that variable and upload it to a server running PHP. (See our finding out how to send a js variable to PHP $_FILE.)

We are finding that the .size of the blob is about 50% larger than the .length of the file we are uploading. (We had been uploading by converting to base64 then uploading with JSON, but one reason we are looking to change is to hopefully avoid the 33% bump in size from using base64.)

The blob is consistently about 50% larger from moderate sizes up to larger sizes. As a small test, we created a Blob using 120 chars as input and found the Blob.size to be 210. (We normally use the correct file.type; image/png was just to have it be interpreted as binary data that didn't need encoding.) From actual use in our code: we uploaded a 900K PDF file. Type was something like 'application/pdf'. The resultant blob was like 1,400K. Also tried with PNG.

I would think that the Blob should be about the same size as the input,no? What might we be doing wrong?

new Blob(["123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"], {type:"image/png"});
like image 796
Mark Kasson Avatar asked Apr 29 '15 18:04

Mark Kasson


People also ask

How does JavaScript handle BLOB data?

The Blob object represents a blob, which is a file-like object of immutable, raw data; they can be read as text or binary data, or converted into a ReadableStream so its methods can be used for processing the data. Blobs can represent data that isn't necessarily in a JavaScript-native format.

What is Blob size JavaScript?

The number of bytes of data contained within the Blob (or Blob -based object, such as a File ).

What is Blob arrayBuffer?

arrayBuffer() The arrayBuffer() method in the Blob interface returns a Promise that resolves with the contents of the blob as binary data contained in an ArrayBuffer .


2 Answers

There were three factors that led to the increase in size.

Our first issue was that we were reading the file using FileReader's readAsDataURL. This reads a file and encodes it in base64, which results in a roughly 33% increase in size. We changed to readAsArrayBuffer and read into a Uint8Array (an array of 8 bit bytes).

We are passing the file to encryption system forge.js and that only takes data in as a string, so we had to convert the binary ArrayBuffer to a string. We used the more performant solution here. This reference is more thorough and refers to the relatively new TextEncoder/Decoder APIs. We haven't gotten to using them yet. I'd guess they perform better as they're purely native.

Once forge does the encryption, we have to convert to a Blob, so see this on how to convert ArrayBuffer to and from Blob.

Second, as @TechnicalChaos pointed to, we were using a binary string in javascript. This encoding causes it to be larger in size because strings in javascript are encoded in 2 byte characters.

The blob could then be attached to a form to be uploaded to our PHP server into $_FILE.

Now our uploads are approximately the same size as the files we encrypt.

like image 123
Mark Kasson Avatar answered Nov 15 '22 22:11

Mark Kasson


I had a similar issue with putting binary data into a Javascript blob - turns out Blob was assuming UTF-8 encoding and so some of the raw data bytes ended up as multibyte characters.

The solution was to put each byte of binary data into a Uint8Array and pass that to Blob instead.

like image 38
tschumann Avatar answered Nov 15 '22 20:11

tschumann