Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to read from a big CSV file without loading everything to memory using Javascript

I'm using Atom/Electron for building an app that has visualisations over video based on data. Each video has a corresponding CSV file with information for each frame. The videos are about 100 minutes, so the files has plenty of data!

The problem I'm having is that it takes a couple of seconds to load and parse the file. Most of the time this is not a problem. But I need to make playlist of parts of videos, and loading the whole CSV file each time a video is changed is not a viable option.

I been looking to file streaming options as fast-csv, but I didn't manage to start reading for an arbitrary part of the file.

EDIT: from the FS documentation. In this case, the question is how can I know which byte corresponds to the position I want in the file?

options can include start and end values to read a range of bytes from the file instead of the entire file. Both start and end are inclusive and start at 0.

What do you think would be the better and most performant approach to this situation?

In concrete:

Is there a way of starting to read a stream from any part of a CSV file?

Do you consider there is another storage method that would allow me to solve this problem better?

UPDATE:

In the end, I solved this by storing the data in a file in binary format. Since I know how many columns the file has I can just read straight from the segment of the file without any performance implications.

like image 212
limoragni Avatar asked Jun 22 '15 19:06

limoragni


1 Answers

I would highly recommend Papaparse for this. It allows the streaming of a CSV 'row-by-row', which can be processed in JSON format based on headers in the file.

Within a config object passed to the parsing function, you can give a 'step' parameter, which is a function to be carried out for each row of the file as it steps through.

Note: Can also be configured to use a worker-thread for increased performance when handling very large CSV's

http://papaparse.com/docs

like image 57
locksem Avatar answered Sep 22 '22 05:09

locksem