Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I'm trying to count all letters in a txt file then display in descending order

As the title says:

So far this is where I'm at my code does work however I am having trouble displaying the information in order. Currently it just displays the information randomly.

def frequencies(filename):
    infile=open(filename, 'r')
    wordcount={}
    content = infile.read()
    infile.close()
    counter = {}
    invalid = "‘'`,.?!:;-_\n—' '"

    for word in content:
        word = content.lower()
        for letter in word:
            if letter not in invalid:
                if letter not in counter:
                    counter[letter] = content.count(letter)
                    print('{:8} appears {} times.'.format(letter, counter[letter]))

Any help would be greatly appreciated.

like image 906
Andrew Avatar asked Jan 08 '17 08:01

Andrew


3 Answers

Dictionaries are unordered data structures. Also if you want to count some items within a set of data you better to use collections.Counter() which is more optimized and pythonic for this aim.

Then you can just use Counter.most_common(N) in order to print most N common items within your Counter object.

Also regarding the opening of files, you can simply use the with statement that closes the file at the end of the block automatically. And it's better to not print the final result inside your function instead, you can make your function a generator by yielding the intended lines and then printing them when even you want.

from collections import Counter

def frequencies(filename, top_n):
    with open(filename) as infile:
        content = infile.read()
    invalid = "‘'`,.?!:;-_\n—' '"
    counter = Counter(filter(lambda x: not invalid.__contains__(x), content))
    for letter, count in counter.most_common(top_n):
        yield '{:8} appears {} times.'.format(letter, count)

Then use a for loop in order to iterate over the generator function:

for line in frequencies(filename, 100):
    print(line)
like image 192
Mazdak Avatar answered Oct 24 '22 06:10

Mazdak


You don't need to iterate over 'words', and then over letters in them. When you iterate over a string (like content), you will already have single chars (length 1 strings). Then, you would want to wait untill after your counting loop before showing output. After counting, you could manually sort:

for letter, count in sorted(counter.items(), key=lambda x: x[1], reverse=True):
    # do stuff

However, better use collections.Counter:

from collections import Counter

content = filter(lambda x: x not in invalid, content)
c = Counter(content)
for letter, count in c.most_common():  # descending order of counts
    print('{:8} appears {} times.'.format(letter, number))
# for letter, number in c.most_common(n):  # limit to n most
#     print('{:8} appears {} times.'.format(letter, count))
like image 26
user2390182 Avatar answered Oct 24 '22 06:10

user2390182


Displaying in descending order needs to be outside your search-loop otherwise they will be displayed as they are encountered.

Sorting in descending order is quite easy using the built-in sorted (you'll need to set the reverse-argument!)

However python is batteries included and there is already a Counter. So it could be as simply as:

from collections import Counter
from operator import itemgetter

def frequencies(filename):
    # Sets are especially optimized for fast lookups so this will be
    # a perfect fit for the invalid characters.
    invalid = set("‘'`,.?!:;-_\n—' '")

    # Using open in a with block makes sure the file is closed afterwards.
    with open(filename, 'r') as infile:  
        # The "char for char ...." is a conditional generator expression
        # that feeds all characters to the counter that are not invalid.
        counter = Counter(char for char in infile.read().lower() if char not in invalid)

    # If you want to display the values:
    for char, charcount in sorted(counter.items(), key=itemgetter(1), reverse=True):
        print(char, charcount)

The Counter already has a most_common method but you want to display all characters and counts so it's not a good fit in this case. However if you only want to know the x most common counts then it would suitable.

like image 1
MSeifert Avatar answered Oct 24 '22 04:10

MSeifert