As the title says:
So far this is where I'm at my code does work however I am having trouble displaying the information in order. Currently it just displays the information randomly.
def frequencies(filename):
infile=open(filename, 'r')
wordcount={}
content = infile.read()
infile.close()
counter = {}
invalid = "‘'`,.?!:;-_\n—' '"
for word in content:
word = content.lower()
for letter in word:
if letter not in invalid:
if letter not in counter:
counter[letter] = content.count(letter)
print('{:8} appears {} times.'.format(letter, counter[letter]))
Any help would be greatly appreciated.
Dictionaries are unordered data structures. Also if you want to count some items within a set of data you better to use collections.Counter()
which is more optimized and pythonic for this aim.
Then you can just use Counter.most_common(N)
in order to print most N
common items within your Counter object.
Also regarding the opening of files, you can simply use the with
statement that closes the file at the end of the block automatically. And it's better to not print the final result inside your function instead, you can make your function a generator by yielding the intended lines and then printing them when even you want.
from collections import Counter
def frequencies(filename, top_n):
with open(filename) as infile:
content = infile.read()
invalid = "‘'`,.?!:;-_\n—' '"
counter = Counter(filter(lambda x: not invalid.__contains__(x), content))
for letter, count in counter.most_common(top_n):
yield '{:8} appears {} times.'.format(letter, count)
Then use a for loop in order to iterate over the generator function:
for line in frequencies(filename, 100):
print(line)
You don't need to iterate over 'words', and then over letters in them. When you iterate over a string (like content
), you will already have single chars (length 1 strings). Then, you would want to wait untill after your counting loop before showing output. After counting, you could manually sort:
for letter, count in sorted(counter.items(), key=lambda x: x[1], reverse=True):
# do stuff
However, better use collections.Counter
:
from collections import Counter
content = filter(lambda x: x not in invalid, content)
c = Counter(content)
for letter, count in c.most_common(): # descending order of counts
print('{:8} appears {} times.'.format(letter, number))
# for letter, number in c.most_common(n): # limit to n most
# print('{:8} appears {} times.'.format(letter, count))
Displaying in descending order needs to be outside your search-loop otherwise they will be displayed as they are encountered.
Sorting in descending order is quite easy using the built-in sorted
(you'll need to set the reverse
-argument!)
However python is batteries included and there is already a Counter
. So it could be as simply as:
from collections import Counter
from operator import itemgetter
def frequencies(filename):
# Sets are especially optimized for fast lookups so this will be
# a perfect fit for the invalid characters.
invalid = set("‘'`,.?!:;-_\n—' '")
# Using open in a with block makes sure the file is closed afterwards.
with open(filename, 'r') as infile:
# The "char for char ...." is a conditional generator expression
# that feeds all characters to the counter that are not invalid.
counter = Counter(char for char in infile.read().lower() if char not in invalid)
# If you want to display the values:
for char, charcount in sorted(counter.items(), key=itemgetter(1), reverse=True):
print(char, charcount)
The Counter already has a most_common
method but you want to display all characters and counts so it's not a good fit in this case. However if you only want to know the x most common counts then it would suitable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With