Removal of duplicate lines from a text file using python [duplicate]

Question

Earlier I wrote the code for extracting a specific string from multiple files and the result is stored in a separate file.Now this file has duplicate results which I need to remove .

import glob
import re
import os.path
path=r"H:\sample"
file_array=glob.glob(os.path.join(path,'*.txt'))
with open("aiq_hits.txt","w") as out_file;
    for input_filename in file_array:
        with open(input_filename) as in_file:
            for line in in_file:
                match=re.findall(r"""(?<=')[^']*\.aiq(?=')|(?<=")[^"]*\.aiq(?=")""")                  
                for item in match:
                    out_file.write("%s
" %item)
out_file.close()

This out_file has duplicate results which I need to remove and result should be the same file

Vivek Sable · Accepted Answer

Load input file.
Read input file by lines. The readlines will return a list of lines from the file content.
Create a new list.
Iterate every line from the lines.
Strip the white spaces from the line.
Check if the line is present in new_lines.
If not, then append the line in the new_lines list.
Write new_lines into the file.

Demo:

input_file = "input.txt"
with open(input_file, "r") as fp:
    lines = fp.readlines()
    new_lines = []
    for line in lines:
        #- Strip white spaces
        line = line.strip()
        if line not in new_lines:
            new_lines.append(line)

output_file = "output.txt"
with open(output_file, "w") as fp:
    fp.write("
".join(new_lines))

Removal of duplicate lines from a text file using python [duplicate]

Tags:

python

duplicates

goshanky

1 Answers

Vivek Sable

Recent Activity

Donate For Us

Removal of duplicate lines from a text file using python [duplicate]

Tags:

python

duplicates

goshanky

1 Answers

Vivek Sable

Related questions

Recent Activity

Donate For Us