Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Locating Duplicate Words in a Text File

Tags:

python

I was wondering if you could help me with a python programming issue? I'm currently trying to write a program that reads a text file and output "word 1 True" if the word had already occurred in that file before or "word 1 False" if this is the first time the word appeared.

Here's what I came up with:

fh = open(fname)
lst = list ()
for line in fh:
    words = line.split()
    for word in words:
        if word in words:
            print("word 1 True", word)
        else:
            print("word 1 False", word)

However, it only returns "word 1 True"

Please advise.

Thanks!

like image 825
Sketch0482 Avatar asked Jun 04 '26 16:06

Sketch0482


1 Answers

A simple (and fast) way to implement this would be with a python dictionary. These can be thought of like an array, but the index-key is a string rather than a number.

This gives some code fragments like:

found_words = {}    # empty dictionary
words1 = open("words1.txt","rt").read().split(' ')  # TODO - handle punctuation
for word in words1:
    if word in found_words:
        print(word + " already in file")
    else:
        found_words[word] = True    # could be set to anything

Now when processing your words, simply checking to see if the word already exists in the dictionary indicates that it was seen already.

like image 161
Kingsley Avatar answered Jun 06 '26 05:06

Kingsley