I am currently working on the HTML5 File API, and I need to get binary file data.
The FileReader
's readAsText
, and readAsDataURL
methods work fine, but readAsBinaryString
returns the same data as readAsText
.
I need binary data, but im getting a text string. Am I missing something?
To open the Binary Editor on an existing file, go to menu File > Open > File, select the file you want to edit, then select the drop arrow next to the Open button, and choose Open With > Binary Editor.
Generally speaking, if you read a binary file in text mode you will get unhelpful data that looks like garbage, if you write a binary file in text mode it will probably be corrupt, if you read a text file in binary mode you can't perform any useful text operations on the bytes, and if you write a text file in binary ...
You can open the file using open() method by passing b parameter to open it in binary mode and read the file bytes. open('filename', "rb") opens the binary file in read mode. b – To specify it's a binary file. No decoding of bytes to string attempt will be made.
JavaScript doesn't have a "binary" type and so they went with a String with the guarantee that no character stored in the string would be outside the range 0.. 255. (They could have gone with an array of numbers instead, but they didn't.) The example shows how to get the raw value of a "character" from the string.
2022 update: See explanation below for why the OP was seeing what they were seeing, but the code there is outdated. In modern environments, you'd use the methods on the Blob
interface (which File
inherits):
arrayBuffer
for reading binary data (which you can then access via any of the typed arrays)text
to read textual datastream
for getting a ReadableStream
for handling data via streaming (which allows you to do multiple transformations on the data without making multiple passes through it and/or use the data without having to keep all of it in memoryOnce you have the file from the file input (const file = fileInput.files[0]
or similar), it's literally just a matter of:
await file.text(); // To read its text
// or
await file.arrayBuffer(); // To read its contents into an array buffer
(See ReadableStream
for an example of streams.)
You might access the array buffer via a Uint8Array
(new Uint8Array(buffer)
).
Here's an example of text
and arrayBuffer
:
const $ = id => document.getElementById(id);
const fileInput = $("fileInput");
const btnRead = $("btnRead");
const rdoText = $("rdoText");
const contentsDiv = $("contents");
const updateButton = () => {
btnRead.disabled = fileInput.files.length === 0;
};
const readTextFile = async (file) => {
const text = await file.text();
contentsDiv.textContent = text;
contentsDiv.classList.add("text");
contentsDiv.classList.remove("binary");
console.log("Done reading text file");
};
const readBinaryFile = async (file) => {
// Read into an array buffer, create
const buffer = await file.arrayBuffer();
// Get a byte array for that buffer
const bytes = new Uint8Array(buffer);
// Show it as hex text
const lines = [];
let line = [];
bytes.forEach((byte, index) => {
const hex = byte.toString(16).padStart(2, "0");
line.push(hex);
if (index % 16 === 15) {
lines.push(line.join(" "));
line = [];
}
});
contentsDiv.textContent = lines.join("\n");
contentsDiv.classList.add("binary");
contentsDiv.classList.remove("text");
console.log(`Done reading binary file (length: ${bytes.length})`);
};
updateButton();
fileInput.addEventListener("input", updateButton);
btnRead.addEventListener("click", () => {
const file = fileInput.files[0];
if (!file) {
return;
}
const readFile = rdoText.checked ? readTextFile : readBinaryFile;
readFile(fileInput.files[0])
.catch(error => {
console.error(`Error reading file:`, error);
});
});
body {
font-family: sans-serif;
}
#contents {
font-family: monospace;
white-space: pre;
}
<form>
<div>
<label>
<span>File:</span>
<input type="file" id="fileInput">
</label>
</div>
<div>
<label>
<input id="rdoText" type="radio" name="format" value="text" checked>
Text
</label>
<label>
<input id="rdoBinary" type="radio" name="format" value="binary">
Binary
</label>
</div>
<div>
<input id="btnRead" type="button" value="Read File">
</div>
</form>
<div id="contents"></div>
Note in 2018: readAsBinaryString
is outdated. For use cases where previously you'd have used it, these days you'd use readAsArrayBuffer
(or in some cases, readAsDataURL
) instead.
readAsBinaryString
says that the data must be represented as a binary string, where:
...every byte is represented by an integer in the range [0..255].
JavaScript originally didn't have a "binary" type (until ECMAScript 5's WebGL support of Typed Array* (details below) -- it has been superseded by ECMAScript 2015's ArrayBuffer) and so they went with a String with the guarantee that no character stored in the String would be outside the range 0..255. (They could have gone with an array of Numbers instead, but they didn't; perhaps large Strings are more memory-efficient than large arrays of Numbers, since Numbers are floating-point.)
If you're reading a file that's mostly text in a western script (mostly English, for instance), then that string is going to look a lot like text. If you read a file with Unicode characters in it, you should notice a difference, since JavaScript strings are UTF-16** (details below) and so some characters will have values above 255, whereas a "binary string" according to the File API spec wouldn't have any values above 255 (you'd have two individual "characters" for the two bytes of the Unicode code point).
If you're reading a file that's not text at all (an image, perhaps), you'll probably still get a very similar result between readAsText
and readAsBinaryString
, but with readAsBinaryString
you know that there won't be any attempt to interpret multi-byte sequences as characters. You don't know that if you use readAsText
, because readAsText
will use an encoding determination to try to figure out what the file's encoding is and then map it to JavaScript's UTF-16 strings.
You can see the effect if you create a file and store it in something other than ASCII or UTF-8. (In Windows you can do this via Notepad; the "Save As" as an encoding drop-down with "Unicode" on it, by which looking at the data they seem to mean UTF-16; I'm sure Mac OS and *nix editors have a similar feature.) Here's a page that dumps the result of reading a file both ways:
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
<title>Show File Data</title>
<style type='text/css'>
body {
font-family: sans-serif;
}
</style>
<script type='text/javascript'>
function loadFile() {
var input, file, fr;
if (typeof window.FileReader !== 'function') {
bodyAppend("p", "The file API isn't supported on this browser yet.");
return;
}
input = document.getElementById('fileinput');
if (!input) {
bodyAppend("p", "Um, couldn't find the fileinput element.");
}
else if (!input.files) {
bodyAppend("p", "This browser doesn't seem to support the `files` property of file inputs.");
}
else if (!input.files[0]) {
bodyAppend("p", "Please select a file before clicking 'Load'");
}
else {
file = input.files[0];
fr = new FileReader();
fr.onload = receivedText;
fr.readAsText(file);
}
function receivedText() {
showResult(fr, "Text");
fr = new FileReader();
fr.onload = receivedBinary;
fr.readAsBinaryString(file);
}
function receivedBinary() {
showResult(fr, "Binary");
}
}
function showResult(fr, label) {
var markup, result, n, aByte, byteStr;
markup = [];
result = fr.result;
for (n = 0; n < result.length; ++n) {
aByte = result.charCodeAt(n);
byteStr = aByte.toString(16);
if (byteStr.length < 2) {
byteStr = "0" + byteStr;
}
markup.push(byteStr);
}
bodyAppend("p", label + " (" + result.length + "):");
bodyAppend("pre", markup.join(" "));
}
function bodyAppend(tagName, innerHTML) {
var elm;
elm = document.createElement(tagName);
elm.innerHTML = innerHTML;
document.body.appendChild(elm);
}
</script>
</head>
<body>
<form action='#' onsubmit="return false;">
<input type='file' id='fileinput'>
<input type='button' id='btnLoad' value='Load' onclick='loadFile();'>
</form>
</body>
</html>
If I use that with a "Testing 1 2 3" file stored in UTF-16, here are the results I get:
Text (13): 54 65 73 74 69 6e 67 20 31 20 32 20 33 Binary (28): ff fe 54 00 65 00 73 00 74 00 69 00 6e 00 67 00 20 00 31 00 20 00 32 00 20 00 33 00
As you can see, readAsText
interpreted the characters and so I got 13 (the length of "Testing 1 2 3"), and readAsBinaryString
didn't, and so I got 28 (the two-byte BOM plus two bytes for each character).
* XMLHttpRequest.response with responseType = "arraybuffer"
is supported in HTML 5.
** "JavaScript strings are UTF-16" may seem like an odd statement; aren't they just Unicode? No, a JavaScript string is a series of UTF-16 code units; you see surrogate pairs as two individual JavaScript "characters" even though, in fact, the surrogate pair as a whole is just one character. See the link for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With