Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing large text files, modified on the fly

I need to parse a large CSV file in real-time, while it's being modified (appended) by a different process. By large I mean ~20 GB at this point, and slowly growing. The application only needs to detect and report certain anomalies in the data stream, for which it only needs to store small state info (O(1) space).

I was thinking about polling the file's attributes (size) every couple of seconds, opening a read-only stream, seeking to the previous position, and then continuing to parse where I first stopped. But since this is a text (CSV) file, I obviously need to keep track of new-line characters when continuing somehow, to ensure I always parse an entire line.

If I am not mistaken, this shouldn't be such a problem to implement, but I wanted to know if there is a common way/library which solves some of these problems already?

Note: I don't need a CSV parser. I need info about a library which simplifies reading lines from a file which is being modified on the fly.

like image 407
Groo Avatar asked Apr 27 '12 11:04

Groo


1 Answers

I did not test it, but I think you can use a FileSystemWatcher to detect when a different process modified your file. In the Changed event, you will be able to seek to a position you saved before, and read the additional content.

like image 175
schglurps Avatar answered Oct 08 '22 18:10

schglurps