Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting unique words in python

In direct, my code so far is this :

from glob import glob
pattern = "D:\\report\\shakeall\\*.txt"
filelist = glob(pattern)
def countwords(fp):
    with open(fp) as fh:
        return len(fh.read().split())
print "There are" ,sum(map(countwords, filelist)), "words in the files. " "From directory",pattern

I want to add a code that counts unique words from pattern(42 txt files in this path) but I don't know how. Can anybody help me?

like image 618
rocksland Avatar asked Aug 10 '12 10:08

rocksland


People also ask

What is unique word count?

This calculator counts the number of unique words in a text (total number of words minus all word repetitions). It also counts a number of repeated words. It also can remove all the repetitions from the text.


2 Answers

The best way to count objects in Python is to use collections.Counter class, which was created for that purposes. It acts like a Python dict but is a bit easier in use when counting. You can just pass a list of objects and it counts them for you automatically.

>>> from collections import Counter
>>> c = Counter(['hello', 'hello', 1])
>>> print c
Counter({'hello': 2, 1: 1})

Also Counter has some useful methods like most_common, visit documentation to learn more.

One method of Counter class that can also be very useful is update method. After you've instantiated Counter by passing a list of objects, you can do the same using update method and it will continue counting without dropping old counters for objects:

>>> from collections import Counter
>>> c = Counter(['hello', 'hello', 1])
>>> print c
Counter({'hello': 2, 1: 1})
>>> c.update(['hello'])
>>> print c
Counter({'hello': 3, 1: 1})
like image 51
Rostyslav Dzinko Avatar answered Oct 07 '22 08:10

Rostyslav Dzinko


print len(set(w.lower() for w in open('filename.dat').read().split()))

Reads the entire file into memory, splits it into words using whitespace, converts each word to lower case, creates a (unique) set from the lowercase words, counts them and prints the output

like image 45
NIlesh Sharma Avatar answered Oct 07 '22 07:10

NIlesh Sharma