Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest Way to Delete a Line from Large File in Python

Tags:

I am working with a very large (~11GB) text file on a Linux system. I am running it through a program which is checking the file for errors. Once an error is found, I need to either fix the line or remove the line entirely. And then repeat...

Eventually once I'm comfortable with the process, I'll automate it entirely. For now however, let's assume I'm running this by hand.

What would be the fastest (in terms of execution time) way to remove a specific line from this large file? I thought of doing it in Python...but would be open to other examples. The line might be anywhere in the file.

If Python, assume the following interface:

def removeLine(filename, lineno):

Thanks,

-aj

like image 232
AJ. Avatar asked Feb 24 '10 20:02

AJ.


People also ask

How do you delete a line in a file?

To delete a line, we'll use the sed “d” command. Note that you have to declare which line to delete. Otherwise, sed will delete all the lines.

How do you delete a line at the beginning in Python?

To remove lines starting with specified prefix, we use “^” (Starts with) metacharacter. We also make use of re.


3 Answers

You can have two file objects for the same file at the same time (one for reading, one for writing):

def removeLine(filename, lineno):
    fro = open(filename, "rb")

    current_line = 0
    while current_line < lineno:
        fro.readline()
        current_line += 1

    seekpoint = fro.tell()
    frw = open(filename, "r+b")
    frw.seek(seekpoint, 0)

    # read the line we want to discard
    fro.readline()

    # now move the rest of the lines in the file 
    # one line back 
    chars = fro.readline()
    while chars:
        frw.writelines(chars)
        chars = fro.readline()

    fro.close()
    frw.truncate()
    frw.close()
like image 80
K. Brafford Avatar answered Sep 22 '22 23:09

K. Brafford


Modify the file in place, offending line is replaced with spaces so the remainder of the file does not need to be shuffled around on disk. You can also "fix" the line in place if the fix is not longer than the line you are replacing

import os
from mmap import mmap
def removeLine(filename, lineno):
    f=os.open(filename, os.O_RDWR)
    m=mmap(f,0)
    p=0
    for i in range(lineno-1):
        p=m.find('\n',p)+1
    q=m.find('\n',p)
    m[p:q] = ' '*(q-p)
    os.close(f)

If the other program can be changed to output the fileoffset instead of the line number, you can assign the offset to p directly and do without the for loop

like image 36
John La Rooy Avatar answered Sep 25 '22 23:09

John La Rooy


As far as I know, you can't just open a txt file with python and remove a line. You have to make a new file and move everything but that line to it. If you know the specific line, then you would do something like this:

f = open('in.txt')
fo = open('out.txt','w')

ind = 1
for line in f:
    if ind != linenumtoremove:
        fo.write(line)
    ind += 1

f.close()
fo.close()

You could of course check the contents of the line instead to determine if you want to keep it or not. I also recommend that if you have a whole list of lines to be removed/changed to do all those changes in one pass through the file.

like image 1
Justin Peel Avatar answered Sep 25 '22 23:09

Justin Peel