Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse Remote CSV File using Nodejs / Papa Parse?

I am currently working on parsing a remote csv product feed from a Node app and would like to use Papa Parse to do that (as I have had success with it in the browser in the past).

Papa Parse Github: https://github.com/mholt/PapaParse

My initial attempts and web searching haven't turned up exactly how this would be done. The Papa readme says that Papa Parse is now compatible with Node and as such Baby Parse (which used to serve some of the Node parsing functionality) has been depreciated.

Here's a link to the Node section of the docs for anyone stumbling on this issue in the future: https://github.com/mholt/PapaParse#papa-parse-for-node

From that doc paragraph it looks like Papa Parse in Node can parse a readable stream instead of a File. My question is;

Is there any way to utilize Readable Streams functionality to use Papa to download / parse a remote CSV in Node some what similar to how Papa in the browser uses XMLHttpRequest to accomplish that same goal?

For Future Visibility For those searching on the topic (and to avoid repeating a similar question) attempting to utilize the remote file parsing functionality described here: http://papaparse.com/docs#remote-files will result in the following error in your console:

"Unhandled rejection ReferenceError: XMLHttpRequest is not defined"

I have opened an issue on the official repository and will update this Question as I learn more about the problems that need to be solved.

like image 725
Necevil Avatar asked Dec 14 '17 22:12

Necevil


People also ask

Is Papa parse free?

PapaParse - Libraries - cdnjs - The #1 free and open source CDN built to make life easier for developers.

What is Papa parse?

The world's first multi-threaded CSV parser for the browser. Papa can handle files gigabytes in size without crashing. Use Papa when performance, privacy, and correctness matter to you. Papa alleviates privacy concerns related to uploading files. Malformed CSV is handled gracefully with a detailed error report.

How do I parse a csv file in react?

The handleParse function will parse the CSV data using the papa parser and set the data state to the columns of the CSV file. Then we finally return the component which contains an input element to take the uploaded file, a button to parse the data on click, and a container to show any errors or parsed data.


1 Answers

After lots of tinkering I finally got a working example of this using asynchronous streams and with no additional libraries (except fs/request). It works for remote and local files.

I needed to create a data stream, as well as a PapaParse stream (using papa.NODE_STREAM_INPUT as the first argument to papa.parse()), then pipe the data into the PapaParse stream. Event listeners need to be implemented for the data and finish events on the PapaParse stream. You can then use the parsed data inside your handler for the finish event.

See the example below:

const papa = require("papaparse");
const request = require("request");

const options = {/* options */};

const dataStream = request.get("https://example.com/myfile.csv");
const parseStream = papa.parse(papa.NODE_STREAM_INPUT, options);

dataStream.pipe(parseStream);

let data = [];
parseStream.on("data", chunk => {
    data.push(chunk);
});

parseStream.on("finish", () => {
    console.log(data);
    console.log(data.length);
});

The data event for the parseStream happens to run once for each row in the CSV (though I'm not sure this behaviour is guaranteed). Hope this helps someone!

To use a local file instead of a remote file, you can do the same thing except the dataStream would be created using fs:

const dataStream = fs.createReadStream("./myfile.csv");

(You may want to use path.join and __dirname to specify a path relative to where the file is located rather than relative to where it was run)

like image 107
David Liao Avatar answered Sep 23 '22 02:09

David Liao