Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find and replace multiple lines in text file?

I am running Python 2.7.

I have three text files: data.txt, find.txt, and replace.txt. Now, find.txt contains several lines that I want to search for in data.txt and replace that section with the content in replace.txt. Here is a simple example:

data.txt

pumpkin
apple
banana
cherry
himalaya
skeleton
apple
banana
cherry
watermelon
fruit

find.txt

apple
banana
cherry

replace.txt

1
2
3

So, in the above example, I want to search for all occurences of apple, banana, and cherry in the data and replace those lines with 1,2,3.

I am having some trouble with the right approach to this as my data.txt is about 1MB so I want to be as efficient as possible. One dumb way is to concatenate everything into one long string and use replace, and then output to a new text file so all the line breaks will be restored.

import re

data = open("data.txt", 'r')
find = open("find.txt", 'r')
replace = open("replace.txt", 'r')

data_str = ""
find_str = ""
replace_str = "" 

for line in data: # concatenate it into one long string
    data_str += line

for line in find: # concatenate it into one long string
    find_str += line

for line in replace: 
    replace_str += line


new_data = data_str.replace(find, replace)
new_file = open("new_data.txt", "w")
new_file.write(new_data)

But this seems so convoluted and inefficient for a large data file like mine. Also, the replace function appears to be deprecated so that's not good.

Another way is to step through the lines and keep a track of which line you found a match.

Something like this:

location = 0

LOOP1: 
for find_line in find:
    for i, data_line in enumerate(data).startingAtLine(location):
        if find_line == data_line:
            location = i # found possibility

for idx in range(NUMBER_LINES_IN_FIND):
    if find_line[idx] != data_line[idx+location]  # compare line by line
        #if the subsequent lines don't match, then go back and search again
        goto LOOP1

Not fully formed code, I know. I don't even know if it's possible to search through a file from a certain line on or between certain lines but again, I'm just a bit confused in the logic of it all. What is the best way to do this?

Thanks!

like image 496
noblerare Avatar asked Feb 07 '14 20:02

noblerare


1 Answers

If the file is large, you want to read and write one line at a time, so the whole thing isn't loaded into memory at once.

# create a dict of find keys and replace values
findlines = open('find.txt').read().split('\n')
replacelines = open('replace.txt').read().split('\n')
find_replace = dict(zip(findlines, replacelines))

with open('data.txt') as data:
    with open('new_data.txt', 'w') as new_data:
        for line in data:
            for key in find_replace:
                if key in line:
                    line = line.replace(key, find_replace[key])
            new_data.write(line)

Edit: I changed the code to read().split('\n') instead of readliens() so \n isn't included in the find and replace strings

like image 52
mhlester Avatar answered Sep 27 '22 21:09

mhlester