In a web page, I have to read a small part of a file, this for many (1500 - 12000) small files each being approx 1 Mb big. Once I collected the information I need, I push it back to the server.
My problem: I use the FileReader API, garbage collect does not work and memory consumption explodes.
Code goes as:
function extract_information_from_files(input_files) {
//some dummy implementation
for (var i = 0; i < input_files.length; ++i) {
(function dummy_function(file) {
var reader = new FileReader();
reader.onload = function () {
//convert to Uint8Array because used library expects this
var array_buffer = new Uint8Array(reader.result);
//do some fancy stuff with the library (very small subset of data is kept)
//finish
//function call ends, expect garbage collect to start cleaning.
//even explicit dereferencing does not work
};
reader.readAsArrayBuffer(file);
})(input_files[i]);
}
}
Some remarks:
Last strange detail (posted for completeness), when using FileReader combined with https://gildas-lormeau.github.io/zip.js/, where I read a File just before pushing it to a zip archive, garbage collecting just works.
All these remarks, seem to point towards me being unable to use FileReader as it should, so please tell me how.
The problem may be related with the order of execution. In your for
loop you are reading all files with reader.readAsArrayBuffer(file)
. This code will run before any onload
is run for a reader. Depending on the browser implementation of FileReader
this can mean the browser loads the entire file (or simply preallocates the buffer for the entire file) before any onload
is called.
Try to process files like a queue and see if it makes a difference. Something like:
function extract_information_from_files(input_files) {
var reader = new FileReader();
function process_one() {
var single_file = input_files.pop();
if (single_file === undefined) {
return;
}
(function dummy_function(file) {
//var reader = new FileReader();
reader.onload = function () {
// do your stuff
// process next at the end
process_one();
};
reader.readAsArrayBuffer(file);
})(single_file);
}
process_one();
}
extract_information_from_files(file_array_1);
// uncomment next line to process another file array in parallel
// extract_information_from_files(file_array_2);
EDIT: It seems that browsers expect you to reuse FileReaders
. I've edited the code to reuse a single reader and tested (in chrome) that the memory usage stays limited to the largest file you read.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With