Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text to an Array Buffer causes files to be corrupted

I have a sample, from it the user can select a file (PDF files in particular), convert that file to an array buffer, construct the file back from that array buffer and download that file. works as expected.

<input type="file" id="file_input" class="foo" />
<div id="output_field" class="foo"></div>


$(document).ready(function(){
    $('#file_input').on('change', function(e){
        readFile(this.files[0], function(e) {
            //manipulate with result...
            $('#output_field').text(e.target.result);
            try {           
            var file = new Blob([e.target.result], { type: 'application/pdf' });
            var fileURL = window.URL.createObjectURL(file);
            var seconds = new Date().getTime() / 1000;
            var fileName = "cert" + parseInt(seconds) + ".pdf";
            var a = document.createElement("a");
            document.body.appendChild(a);
            a.style = "display: none";
            a.href = fileURL;
            a.download = fileName;
            a.click();
             }
            catch (err){
            $('#output_field').text(err);
            }
        });     
    });
});

function readFile(file, callback){
    var reader = new FileReader();
    reader.onload = callback
    reader.readAsArrayBuffer(file);
}

Now let's say I used reader.readAsText(file); isntead of reader.readAsArrayBuffer(file);. In that case I would convert the text to an array buffer and try to do that same thing.

$(document).ready(function(){
    $('#file_input').on('change', function(e){
        readFile(this.files[0], function(e) {
            //manipulate with result...
            try {
            var buf = new ArrayBuffer(e.target.result.length * 2); 
            var bufView = new Uint16Array(buf);
            for (var i=0, strLen = e.target.result.length; i<strLen; i++) {
                     bufView[i] = e.target.result.charCodeAt(i);
            }

            var file = new Blob([bufView], { type: 'application/pdf' });
            var fileURL = window.URL.createObjectURL(file);
            var seconds = new Date().getTime() / 1000;
            var fileName = "cert" + parseInt(seconds) + ".pdf";
            var a = document.createElement("a");
            document.body.appendChild(a);
            a.style = "display: none";
            a.href = fileURL;
            a.download = fileName;
            a.click();
             }
            catch (err){
            $('#output_field').text(err);
            }
        });

    });
});

function readFile(file, callback){
    var reader = new FileReader();
    reader.onload = callback
    reader.readAsText(file);
}

Now if I passed a PDF file that is small in size and only has text, this would work file, but when selecting files that are large and/or has images in them, a currputed file will be downloaded.

Now I do know that I'm trying to make life harder for myself. But what I'm trying to do is somehow convert the result from readAsText() into an arrayBuffer so that both of readAsText() and readAsArrayBuffer() work identicaly.

like image 573
user3159792 Avatar asked Apr 15 '19 14:04

user3159792


1 Answers

The readAsText method doesn't simply make the bytes accessible in a UCS-16 string. Instead, it decodes them as text, according to a given text encoding format, by default UTF-8. This will mess with any binary data that you are trying to read. As you already figured out, use readAsArrayBuffer for that.

You can try to use a TextEncoder to encode your text back to a typed array, but that's not guaranteed to yield the same result: a BOM gets stripped, invalid UTF-8 sequences lead to errors, and if you're unlucky then even Unicode normalisation will happen.

It might get easier if you explicitly specify a single-byte decoding, but really you should just use readAsArrayBuffer.

like image 71
Bergi Avatar answered Sep 30 '22 19:09

Bergi