How would you process 1GB of text data?

Question

Task: Process 3 text files of close to 1GB size and turn them into csv files. The source files have a custom structure, so regular expressions would be useful.

Problem: There is no problem. I use php for it and it's fine. I don't actually need to process the files faster. I'm just curious how you would approach the problem in general. In the end i'd like to see simple and convenient solutions that might perform faster than php.

@felix I'm sure about that. :) If i'm done with the whole project i'll probably post this as cross language code ping pong.

@mark My approach currently works like that, with the exception that i cache few hundred lines to keep file writes low. An well thought through memory trade off would probably squeeze out some time. But i'm sure that other approaches can beat php by far, like a full utilization of a *nix toolset.

Mark Byers · Accepted Answer

Firstly it probably doesn't really matter much which language you use for this as it probably will be I/O bound. What is more important is that you use an efficient approach / algorithm. In particular you want to avoid reading the entire file into memory if possible, and avoid concatenating the result into a huge string before writing it to disk.

Instead use a streaming approach: read a line of input, process it, then write a line of output.

How would you process 1GB of text data?

Tags:

regex

text

php

c0rnh0li0

1 Answers

Mark Byers

Recent Activity

Donate For Us

How would you process 1GB of text data?

Tags:

regex

text

php

c0rnh0li0

1 Answers

Mark Byers

Related questions

Recent Activity

Donate For Us