I am counting word of a txt file with the following code: <pre class="prettyprint"><code>#!/usr/bin/python file=open("D:\\zzzz\\names2.txt","r+") wordcount={} for word in file.read().split(): if word not in wordcount: wordcount[word] = 1 else: wordcount[word] += 1 print (word,wordcount) file.close(); </code></pre> this is giving me the output like this: <pre class="prettyprint"><code>>>> goat {'goat': 2, 'cow': 1, 'Dog': 1, 'lion': 1, 'snake': 1, 'horse': 1, 'ï»¿': 1, 'tiger': 1, 'cat': 2, 'dog': 1} </code></pre> but I want the output in the following manner: <pre class="prettyprint"><code>word wordcount goat 2 cow 1 dog 1..... </code></pre> Also I am getting an extra symbol in the output (<code>ï»¿</code>). How can I remove this?

The funny symbols you're encountering are a UTF-8 BOM (Byte Order Mark). To get rid of them, open the file using the correct encoding (I'm assuming you're on Python 3): <pre class="prettyprint"><code>file = open(r"D:\zzzz\names2.txt", "r", encoding="utf-8-sig") </code></pre> Furthermore, for counting, you can use <code>collections.Counter</code>: <pre class="prettyprint"><code>from collections import Counter wordcount = Counter(file.read().split()) </code></pre> Display them with: <pre class="prettyprint"><code>>>> for item in wordcount.items(): print("{}\t{}".format(*item)) ... snake 1 lion 2 goat 2 horse 3 </code></pre>

Word count from a txt file program

Tags:

I am counting word of a txt file with the following code:

#!/usr/bin/python file=open("D:\\zzzz\\names2.txt","r+") wordcount={} for word in file.read().split():     if word not in wordcount:         wordcount[word] = 1     else:         wordcount[word] += 1 print (word,wordcount) file.close();

this is giving me the output like this:

>>>  goat {'goat': 2, 'cow': 1, 'Dog': 1, 'lion': 1, 'snake': 1, 'horse': 1, 'ï»¿': 1, 'tiger': 1, 'cat': 2, 'dog': 1}

but I want the output in the following manner:

word  wordcount goat    2 cow     1 dog     1.....

Also I am getting an extra symbol in the output (ï»¿). How can I remove this?

454

asked Jan 14 '14 06:01

user3068762

2 Answers

The funny symbols you're encountering are a UTF-8 BOM (Byte Order Mark). To get rid of them, open the file using the correct encoding (I'm assuming you're on Python 3):

file = open(r"D:\zzzz\names2.txt", "r", encoding="utf-8-sig")

Furthermore, for counting, you can use collections.Counter:

from collections import Counter wordcount = Counter(file.read().split())

Display them with:

>>> for item in wordcount.items(): print("{}\t{}".format(*item)) ... snake   1 lion    2 goat    2 horse   3

197

answered Oct 14 '22 08:10

Tim Pietzcker

#!/usr/bin/python file=open("D:\\zzzz\\names2.txt","r+") wordcount={} for word in file.read().split():     if word not in wordcount:         wordcount[word] = 1     else:         wordcount[word] += 1 for k,v in wordcount.items():     print k, v

answered Oct 14 '22 08:10

bistaumanga

Related questions
                            
                                Select video from PhotoLibrary in iOS using Swift
                            
                                How to set height of recyclerview programmatically?
                            
                                PlayFramework 2.4.6 error 413 Request Entity Too Large
                            
                                How to install nginx 1.9.15 on amazon linux disto
                            
                                Error: Your project contains C++ files but it is not using a supported native build system [closed]
                            
                                Bash: Kill Vim when "Vim: Warning: Output not to a terminal"
                            
                                Why does sizeof(*"327") return 1 instead of 8 on a 64 bit system?
                            
                                Listview inside the scroll view in flutter
                            
                                Are events lost in jQuery when you remove() an element and append() it elsewhere?
                            
                                Paramiko and Pseudo-tty Allocation
                            
                                Put an object in Handler message
                            
                                Why is smalltalk not a functional programming language?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With