Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using papa parse for big csv files

I am trying to load a file that has about 100k in lines and so far the browser has been crashing ( locally ). I looked on the internet and saw Papa Parse seems to handle large files. Now it is reduced down to about 3-4 minutes to load into the textarea. Once the file is loaded, I then want to do some more jQuery to do counts and things so the process is taking awhile. Is there a way to make the csv load faster? Am I using the program correctly?

<div id="tabs">
<ul>
  <li><a href="#tabs-4">Generate a Report</a></li>
</ul>
<div id="tabs-4">
  <h2>Generating a CSV report</h2>
  <h4>Input Data:</h4>      
  <input id="myFile" type="file" name="files" value="Load File" />
  <button onclick="loadFileAsText()">Load Selected File</button>
  <form action="./" method="post">
  <textarea id="input3" style="height:150px;"></textarea>

  <input id="run3" type="button" value="Run" />
  <input id="runSplit" type="button" value="Run Split" />
  <input id="downloadLink" type="button" value="Download" />
  </form>
</div>
</div>

$(function () {
    $("#tabs").tabs();
});

var data = $('#input3').val();

function handleFileSelect(evt) {
    var file = evt.target.files[0];

Papa.parse(file, {
    header: true,
    dynamicTyping: true,
    complete: function (results) {
        data = results;
    }
});
}

$(document).ready(function () {

    $('#myFile').change(function(handleFileSelect){

    });
});


function loadFileAsText() {
    var fileToLoad = document.getElementById("myFile").files[0];

    var fileReader = new FileReader();
    fileReader.onload = function (fileLoadedEvent) {
        var textFromFileLoaded = fileLoadedEvent.target.result;
        document.getElementById("input3").value = textFromFileLoaded;
    };
    fileReader.readAsText(fileToLoad, "UTF-8");
}
like image 417
Keith Avatar asked Jun 29 '16 12:06

Keith


People also ask

How do I handle a large csv file?

So, how do you open large CSV files in Excel? Essentially, there are two options: Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit; or, Find an Excel add-in that supports CSV files with a higher number of rows.

What does Papa parse do?

"Papa makes it easy for our users to customize CSV parsing to match their business logic." Wikipedia uses Papa Parse in VisualEditor to help article editors effortlessly build data tables from text files.


2 Answers

You probably are using it correctly, it is just the program will take some time to parse through all 100k lines!

This is probably a good use case scenario for Web Workers.

NOTE: Per @tomBryer's answer below, Papa Parse now has support for Web Workers out of the box. This may be a better approach than rolling your own worker.

If you've never used them before, this site gives a decent rundown, but the key part is:

Web Workers mimics multithreading, allowing intensive scripts to be run in the background so they do not block other scripts from running. Ideal for keeping your UI responsive while also performing processor-intensive functions.

Browser coverage is pretty decent as well, with IE10 and below being the only semi-modern browsers that don't support it.

Mozilla has a good video that shows how web workers can speed up frame rate on a page as well.

I'll try to get a working example with web workers for you, but also note that this won't speed up the script, it'll just make it process asynchronously so your page stays responsive.

EDIT:

(NOTE: if you want to parse the CSV within the worker, you'll probably need to import the Papa Parser script within worker.js using the importScript function (which is globally defined within the worker thread). See the MDN page for more info on that.)

Here is my working example:

csv.html

<!doctype html>
<html>
<head>
  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.0.0/jquery.min.js"></script>
  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/PapaParse/4.1.2/papaparse.js"></script>
</head>

<body>
  <input id="myFile" type="file" name="files" value="Load File" />
  <br>
  <button class="load-file">Load and Parse Selected CSV File</button>
  <div id="report"></div>

<script>
// initialize our parsed_csv to be used wherever we want
var parsed_csv;
var start_time, end_time;

// document.ready
$(function() {

  $('.load-file').on('click', function(e) {
    start_time = performance.now();
    $('#report').text('Processing...');

    console.log('initialize worker');

    var worker = new Worker('worker.js');
    worker.addEventListener('message', function(ev) {
      console.log('received raw CSV, now parsing...');

      // Parse our CSV raw text
      Papa.parse(ev.data, {
        header: true,
        dynamicTyping: true,
        complete: function (results) {
            // Save result in a globally accessible var
          parsed_csv = results;
          console.log('parsed CSV!');
          console.log(parsed_csv);

          $('#report').text(parsed_csv.data.length + ' rows processed');
          end_time = performance.now();
          console.log('Took ' + (end_time - start_time) + " milliseconds to load and process the CSV file.")
        }
      });

      // Terminate our worker
      worker.terminate();
    }, false);

    // Submit our file to load
    var file_to_load = document.getElementById("myFile").files[0];

    console.log('call our worker');
    worker.postMessage({file: file_to_load});
  });

});
</script>
</body>

</html>

worker.js

self.addEventListener('message', function(e) {
    console.log('worker is running');

    var file = e.data.file;
    var reader = new FileReader();

    reader.onload = function (fileLoadedEvent) {
        console.log('file loaded, posting back from worker');

        var textFromFileLoaded = fileLoadedEvent.target.result;

        // Post our text file back from the worker
        self.postMessage(textFromFileLoaded);
    };

    // Actually load the text file
    reader.readAsText(file, "UTF-8");
}, false);

GIF of it processing, takes less than a second (all running locally)

GIF of working example

like image 185
romellem Avatar answered Oct 12 '22 21:10

romellem


As of v5, PapaParse has now baked in WebWorkers.

A simple example of invoking the worker within Papaparse is below

Papa.parse(bigFile, {
    worker: true,
    step: function(results) {
        console.log("Row:", results.data);
    }
});

No need to re-implement if you have your own worker with PP, but for future projects, some may find it easier to use PapaParse's solution.

like image 7
tomByrer Avatar answered Oct 12 '22 22:10

tomByrer