I've found loads of examples on to to replace text in files using regex. However it all boils down to two versions:
1. Iterate over all lines in the file and apply regex to each single line
2. Load the whole file.
No. 2 Is not feasible using "my" files - they're about 2GiB...
As to No. 1: Currently this is my approach, however I was wondering... What if need to apply a regex spanning more than one line ?
Here's the Answer:
There is no easy way
I found a StreamRegex-Class which could be able to do what I am looking for.
From what I could grasp of the algorithm:
That way it is not nessesary to load the full file -- or at least the chances of loading the full file in memory are reduced...
However: Worst case is that there is no match in the whole file - in this case the full file will be loaded into memory.
Regex is not the way to go, especially not with these large amounts of text. Create a little parser of your own:
That will give you all the starting- and closing-offset numbers of the comment blocks. You should now be able to replace them by creating a temp-file and writing the text from the original file to the temp file (and writing something else if you're inside a comment block of course).
Edit: source files of 2GiB??
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With