Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FileReader: reading many files with javascript without memory leaks

In a web page, I have to read a small part of a file, this for many (1500 - 12000) small files each being approx 1 Mb big. Once I collected the information I need, I push it back to the server.

My problem: I use the FileReader API, garbage collect does not work and memory consumption explodes.

Code goes as:

function extract_information_from_files(input_files) {

//some dummy implementation
for (var i = 0; i < input_files.length; ++i) {


    (function dummy_function(file) {

        var reader = new FileReader();

        reader.onload = function () {

            //convert to Uint8Array because used library expects this

            var array_buffer = new Uint8Array(reader.result);

            //do some fancy stuff with the library (very small subset of data is kept)

            //finish

            //function call ends, expect garbage collect to start cleaning.
            //even explicit dereferencing does not work
        };

        reader.readAsArrayBuffer(file);

    })(input_files[i]);

}

}

Some remarks:

  • No, at first sight, the library does not seem to keep any references to the loaded objects. Even if you run the code as it is shown above, with array_buffer not used at all, everything is being kept into memory.
  • The behavior is varies according to browser:
  • Chrome (43) does not clear anything all
  • Firefox (38) seems to use a residual memory usage of about 1/3 of the size of all files
  • I found very, very few topics discussing the same issues on Internet. The ones I tried, were:
  • Is it possible to clean memory after FileReader? -> Old, File.prototype.mozSlice has changed to .slice, but even then the problem remains
  • http://www.joelandritsch.com/posts/lessons-learned-in-javascript-11 -> proposed solution does not work.
  • https://developer.mozilla.org/en-US/docs/Web/JavaScript/Memory_Management is not very clear to me. -> It seems like first that you do not need to de-reference (see object is not needed vs. object is not reachable) but then they also state 'Limitation: objects need to be made explicitly unreachable'

Last strange detail (posted for completeness), when using FileReader combined with https://gildas-lormeau.github.io/zip.js/, where I read a File just before pushing it to a zip archive, garbage collecting just works.

All these remarks, seem to point towards me being unable to use FileReader as it should, so please tell me how.

like image 863
cecemel Avatar asked Nov 01 '22 02:11

cecemel


1 Answers

The problem may be related with the order of execution. In your for loop you are reading all files with reader.readAsArrayBuffer(file). This code will run before any onload is run for a reader. Depending on the browser implementation of FileReader this can mean the browser loads the entire file (or simply preallocates the buffer for the entire file) before any onload is called.

Try to process files like a queue and see if it makes a difference. Something like:

function extract_information_from_files(input_files) {
    var reader = new FileReader();

    function process_one() {
        var single_file = input_files.pop();
        if (single_file === undefined) {
            return;
        }

        (function dummy_function(file) {
            //var reader = new FileReader();

            reader.onload = function () {
                // do your stuff
                // process next at the end
                process_one();
            };

            reader.readAsArrayBuffer(file);
        })(single_file);
    }

    process_one();
}

extract_information_from_files(file_array_1);
// uncomment next line to process another file array in parallel
// extract_information_from_files(file_array_2);

EDIT: It seems that browsers expect you to reuse FileReaders. I've edited the code to reuse a single reader and tested (in chrome) that the memory usage stays limited to the largest file you read.

like image 157
m4ktub Avatar answered Nov 12 '22 12:11

m4ktub