In direct, my code so far is this :
from glob import glob
pattern = "D:\\report\\shakeall\\*.txt"
filelist = glob(pattern)
def countwords(fp):
with open(fp) as fh:
return len(fh.read().split())
print "There are" ,sum(map(countwords, filelist)), "words in the files. " "From directory",pattern
I want to add a code that counts unique words from pattern(42 txt files in this path) but I don't know how. Can anybody help me?
This calculator counts the number of unique words in a text (total number of words minus all word repetitions). It also counts a number of repeated words. It also can remove all the repetitions from the text.
The best way to count objects in Python is to use collections.Counter
class, which was created for that purposes. It acts like a Python dict but is a bit easier in use when counting. You can just pass a list of objects and it counts them for you automatically.
>>> from collections import Counter
>>> c = Counter(['hello', 'hello', 1])
>>> print c
Counter({'hello': 2, 1: 1})
Also Counter has some useful methods like most_common, visit documentation to learn more.
One method of Counter class that can also be very useful is update method. After you've instantiated Counter by passing a list of objects, you can do the same using update method and it will continue counting without dropping old counters for objects:
>>> from collections import Counter
>>> c = Counter(['hello', 'hello', 1])
>>> print c
Counter({'hello': 2, 1: 1})
>>> c.update(['hello'])
>>> print c
Counter({'hello': 3, 1: 1})
print len(set(w.lower() for w in open('filename.dat').read().split()))
Reads the entire file into memory, splits it into words using whitespace, converts each word to lower case, creates a (unique) set from the lowercase words, counts them and prints the output
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With