I am counting word of a txt file with the following code:
#!/usr/bin/python file=open("D:\\zzzz\\names2.txt","r+") wordcount={} for word in file.read().split(): if word not in wordcount: wordcount[word] = 1 else: wordcount[word] += 1 print (word,wordcount) file.close();
this is giving me the output like this:
>>> goat {'goat': 2, 'cow': 1, 'Dog': 1, 'lion': 1, 'snake': 1, 'horse': 1, '': 1, 'tiger': 1, 'cat': 2, 'dog': 1}
but I want the output in the following manner:
word wordcount goat 2 cow 1 dog 1.....
Also I am getting an extra symbol in the output (
). How can I remove this?
To count the number of words in only part of your document, select the text you want to count. Then on the Tools menu, click Word Count. Just like the Word desktop program, Word for the web counts words while you type.
The most easiest way to count the number of lines, words, and characters in text file is to use the Linux command “wc” in terminal. The command “wc” basically means “word count” and with different optional parameters one can use it to count the number of lines, words, and characters in a text file.
The funny symbols you're encountering are a UTF-8 BOM (Byte Order Mark). To get rid of them, open the file using the correct encoding (I'm assuming you're on Python 3):
file = open(r"D:\zzzz\names2.txt", "r", encoding="utf-8-sig")
Furthermore, for counting, you can use collections.Counter
:
from collections import Counter wordcount = Counter(file.read().split())
Display them with:
>>> for item in wordcount.items(): print("{}\t{}".format(*item)) ... snake 1 lion 2 goat 2 horse 3
#!/usr/bin/python file=open("D:\\zzzz\\names2.txt","r+") wordcount={} for word in file.read().split(): if word not in wordcount: wordcount[word] = 1 else: wordcount[word] += 1 for k,v in wordcount.items(): print k, v
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With