I have a HUGE file of 10G. I want to remove line 188888 from this file.
I use sed as follows:
sed -i '188888d' file
The problem is it is really slow. I understand it is because of the size of the file, but is there any way that I can do that faster.
Thanks
Try
sed -i '188888{;d;q;}' file
You may need to experiment with which of the above semi-colons you keep, {d;q} ... being the 2nd thing to try.
This will stop searching the file after it deletes that one line, but you'll still have to spend the time re-writing the file. It would also be worth testing
sed '188888{;q;d;}' file > /path/to/alternate/mountpoint/newFile
where the alternate mountpoint is on a separate disk drive.
final edit Ah, one other option would be to edit the file while it is being written through a pipe
yourLogFileProducingProgram | sed -i '188888d' > logFile
But this assumes that you know that the data your want to delete is always at line '188888, is that possible?
I hope this helps.
The file lines are determined by counting the \n character, if the line size are variable then you cannot calculate the offset to the location given a line but have to count the number of newlines.
This will always be O(n) where n is the number of bytes in the file.
Parallel algorithms does not help either because this operation is disk IO limited, divide and conquer will be even slower.
If you will do this a lot on a same file, there are ways to preprocess the file and make it faster.
A easy way is to build a index with
line#:offset
And when you want to find a line, do binary search (Log n) in index for the line number you want, and use the offset to locate the line in the original file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With