Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to find and replace specific line in a large text file with Python

I have a numbers.txt file that consists of several 100K lines, each one made up of two unique digits separated with : sign:

407597693:1604722326.2426915
510905857:1604722326.2696202
76792361:1604722331.120079
112854912:1604722333.4496727
470822611:1604722335.283259

My goal is to locate a line with the number 407597693 on the left side and then proceed to change the number on the right side by adding 3600 to it. After that, I have to rewrite the numbers.txt file with all the changes. I must perform the same (just different number) operation on the same txt file as fast as possible.

I have managed to make it work via with open: file operations and for loop for each line, searching for the needed number, modifying the line, and then rewriting the whole file. However, I've noticed that constantly performing such an operation does take some time for my program, about 0.2-0.5 sec, which adds up over time and slows everything down considerably.

Here is the code I am using:

number = 407597693

with open("numbers.txt", "r+") as library:
                file = library.read()
            if (str(number) + ":") in file:
                lines = file.splitlines()
                with open("numbers_temp.txt", "a+") as library_temp:
                    for line in lines:
                        if (str(number) + ":") in line:
                            library_temp.write(
                                "\n" + str(number) + ":" + str(time.time() + 3600)
                            )
                        else:
                            library_temp.write("\n" + line)

                    library_temp.seek(0)
                    new_file = library_temp.read()

                    with open("numbers.txt", "w+") as library_2:
                        library_2.write(new_file)

                os.remove("numbers_temp.txt")

I would really appreciate any input on how to speed up this process, many thanks in advance!

like image 630
TimesAndPlaces Avatar asked Feb 01 '26 10:02

TimesAndPlaces


2 Answers

You can open a memory mapped file, use a regular expression to find the line you want, and with any luck you'll only have to change one page in the file. I'm using the decimal module so that you don't have decimal to binary float conversion problems. Usually the new number and the old number will be the same width and file contents will not need to be moved. I'm showing a linux example. Windows mmap.map is a bit different but should be easy to use.

import mmap
import re
from decimal import Decimal

def increment_record(filename, findval, increment):
    with open(filename, "rb+") as fp:
        with mmap.mmap(fp.fileno(), 0) as fmap:
            search = re.search(rf"{findme}:([\d\.]+)".encode("ascii"), fmap, 
                    re.MULTILINE)
            if search:
                # found float to change. Use Decimal for base 10 precision
                newval = Decimal(search.group(1).decode("ascii")) + increment
                newval = f"{newval}".encode("ascii")
                delta = len(newval) - len(search.group(1))
                if delta:
                    # need to expand file and copy
                    fsize = fmap.size()
                    fmap.resize(fsize + delta)
                    fmap.move(search.end(1) + delta, search.end(1), 
                        fsize - search.end(1))
                # change just the number
                fmap[search.start(1):search.start(1) + len(newval)] = newval

# test parameters
filename = "test.txt"
findme = "76792361"
increment = 3600

testdata = """407597693:1604722326.2426915
510905857:1604722326.2696202
76792361:1604722331.120079
112854912:1604722333.4496727
470822611:1604722335.283259"""

open(filename, "w").write(testdata)

increment_record(filename, findme, increment)

print("changes:")
for old,new in zip(testdata.split("\n"), open(filename)):
    new = new.strip()
    if old != new:
        print((old,new))
print("done")
like image 64
tdelaney Avatar answered Feb 04 '26 00:02

tdelaney


I assume your memory can store the whole file. This should be faster by using regex:

import re
number = 407597693
with open("numbers.txt", "r") as f:
    data = f.read()
    # data = re.sub(f'({number}):(.*)', lambda x:f"{x.group(1)}:{float(x.group(2))+3600}", data)
    data = re.sub("^" + str(number) + ".*\n", str(number) + ":" + str(int(time.time()) + 3600) + "\n", data, flags=re.MULTILINE)
with open("numbers.txt", "w") as f:
    f.write(data)
like image 25
Frank Avatar answered Feb 04 '26 00:02

Frank



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!