Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filereader api on big files

My file reader api code has been working good so far until one day I got a 280MB txt file from one of my client. Page just crashes straight up in Chrome and in Firefox nothing happens.

// create new reader object 
var fileReader = new FileReader(); 

// read the file as text 
fileReader.readAsText( $files[i] );  
fileReader.onload = function(e) 
{   // read all the information about the file 
    // do sanity checks here etc... 
    $timeout( function() 
    {    
        // var fileContent = e.target.result;
        // get the first line 
        var firstLine = e.target.result.slice(0, e.target.result.indexOf("\n") ); }}

What I am trying to do above is that get the first line break so that I can get the column length of the file. Should I not read it as text ? How can I get the column length of the file without breaking the page on big files?

like image 747
ODelibalta Avatar asked Sep 12 '14 13:09

ODelibalta


People also ask

How do I use FileReader API?

FileReader can only access the contents of files that the user has explicitly selected, either using an HTML <input type="file"> element or by drag and drop. It cannot be used to read a file by pathname from the user's file system. To read files on the client's file system by pathname, use the File System Access API.

Is FileReader asynchronous?

The FileReader methods work asynchronously but don't return a Promise. And attempting to retrieve the result immediately after calling a method will not work, as the . onload event handler fires only after the FileReader has successfully finished reading the file and updates the FileReader's . result property.

What does FileReader return?

The FileReader result property returns the file's contents. This property is only valid after the read operation is complete, and the format of the data depends on which of the methods was used to initiate the read operation.

How does FileReader work JavaScript?

Introduction to the JavaScript FileReader API And JavaScript uses the FileList object to hold the File objects. To read the content of a file, you use the FileReader object. Note that the FileReader only can access the files you selected via drag & drop or file input.


2 Answers

Your application is failing for big files because you're reading the full file into memory before processing it. This inefficiency can be solved by streaming the file (reading chunks of a small size), so you only need to hold a part of the file in memory.

A File objects is also an instance of a Blob, which offers the .slice method to create a smaller view of the file.

Here is an example that assumes that the input is ASCII (demo: http://jsfiddle.net/mw99v8d4/).

function findColumnLength(file, callback) {     // 1 KB at a time, because we expect that the column will probably small.     var CHUNK_SIZE = 1024;     var offset = 0;     var fr = new FileReader();     fr.onload = function() {         var view = new Uint8Array(fr.result);         for (var i = 0; i < view.length; ++i) {             if (view[i] === 10 || view[i] === 13) {                 // \n = 10 and \r = 13                 // column length = offset + position of \r or \n                 callback(offset + i);                 return;             }         }         // \r or \n not found, continue seeking.         offset += CHUNK_SIZE;         seek();     };     fr.onerror = function() {         // Cannot read file... Do something, e.g. assume column size = 0.         callback(0);     };     seek();      function seek() {         if (offset >= file.size) {             // No \r or \n found. The column size is equal to the full             // file size             callback(file.size);             return;         }         var slice = file.slice(offset, offset + CHUNK_SIZE);         fr.readAsArrayBuffer(slice);     } } 

The previous snippet counts the number of bytes before a line break. Counting the number of characters in a text consisting of multibyte characters is slightly more difficult, because you have to account for the possibility that the last byte in the chunk could be a part of a multibyte character.

like image 177
Rob W Avatar answered Sep 28 '22 00:09

Rob W


There is a awesome library called Papa Parse that do that in a graceful way! It can really handle big files and also you can use web worker.

Just try out the demos that they provide: https://www.papaparse.com/demo

like image 40
Edy Segura Avatar answered Sep 28 '22 01:09

Edy Segura