Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Puzzle code review(spoiler)

I have been working on the problems presented in Python Challenge. One of the problems asks to sift through a mess of characters and pick out the rarest character/s.

My methodology was to read the characters from a text file, store the characters/occurrence as a key/value pair in a dictionary. Sort the dictionary by value and invert the dictionary where the occurrence is the key and the string of characters is the value. Assuming that the rarest character occurs only once, I return the values where the key of this inverted dictionary equals one.

The input(funkymess.txt) is like this:

%%$@$^_#)^)&!_+]!*@&^}@@%%+$&[(_@%+%$*^@$^!+]!&#)*}{}}!}]$[%}@[{@#_^{*......

The code is as follows:

from operator import itemgetter
characterDict = dict()

#put the characters in a dictionary
def putEncounteredCharactersInDictionary(lineStr):
    for character in lineStr:
        if character in characterDict:
            characterDict[character] = characterDict[character]+1
        else:
            characterDict[character] = 1

#Sort the character dictionary
def sortCharacterDictionary(characterDict):
    sortCharDict = dict()
    sortsortedDictionaryItems = sorted(characterDict.iteritems(),key = itemgetter(1))
    for key, value in sortsortedDictionaryItems:
        sortCharDict[key] = value
    return sortCharDict 

#invert the sorted character dictionary
def inverseSortedCharacterDictionary(sortedCharDict):
    inv_map = dict()
    for k, v in sortedCharDict.iteritems():
        inv_map[v] = inv_map.get(v, [])
        inv_map[v].append(k)
    return inv_map


f = open('/Users/Developer/funkymess.txt','r')
for line in f:
    #print line
    processline = line.rstrip('\n')
    putEncounteredCharactersInDictionary(processline)
f.close()

sortedCharachterDictionary = sortCharacterDictionary(characterDict)
#print sortedCharachterDictionary
inversedSortedCharacterDictionary = inverseSortedCharacterDictionary(sortedCharachterDictionary)
print inversedSortedCharacterDictionary[1]r

Can somebody take a look and provide me with some pointers on whether I am on the right track here and if possible provide some feedback on possible optimizations/best-practices and potential refactorings both from the language as well as from an algorithmic standpoint.

Thanks

like image 581
sc_ray Avatar asked Dec 10 '22 12:12

sc_ray


1 Answers

Refactoring: A Walkthrough

I want to walk you through the process of refactoring. Learning to program is not just about knowing the end result, which is what you usually get when you ask a question on Stack Overflow. It's about how to get to that answer yourself. When people post short, dense answers to a question like this it's not always obvious how they arrived at their solutions.

So let's do some refactoring and see what we can do to simplify your code. We'll rewrite, delete, rename, and rearrange code until no more improvements can be made.

Simplify your algorithms

Python need not be so verbose. It is usually a code smell when you have explicit loops operating over lists and dicts in Python, rather than using list comprehensions and functions that operate on containers as a whole.

Use defaultdict to store character counts

A defaultdict(int) will generate entries when they are accessed if they do not exist. This let's us eliminate the if/else branch when counting characters.

from collections import defaultdict
characterDict = defaultdict(int)

def putEncounteredCharactersInDictionary(lineStr):
    for character in lineStr:
        characterDict[character] += 1

Sorting dicts

Dictionaries don't guarantee any ordering on their keys. You cannot assume that the items are stored in the same order that you insert them. So sorting the dict entries and then putting them right back into another dict just scrambles them right back up.

This means that your function is basically a no-op. After you sort the items you will need to keep them as a list of tuples to retain their sorting order. Removing that code we can then reduce this method down to a single line.

def sortCharacterDictionary(characterDict):
    return sorted(characterDict.iteritems(), key=itemgetter(1))

Inverting dicts

Given the previous comment you won't actually have a dict any more after sorting. But assuming you did, this function is one of those cases where explicit looping is discouraged. In Python, always be thinking how you can operate over collections all at once rather than one item at a time.

def inverseSortedCharacterDictionary(sortedCharDict):
    return dict((v, k) for k, v in sortedCharDict.iteritems())

All in one line we (1) iterate over the key/value pairs in the dict; (2) switch them and create inverted value/key tuples; (3) create a dict out of these inverted tuples.

Comment and name wisely

Your method names are long and descriptive. There's no need to repeat the same information in comments. Use comments only when your code isn't self-descriptive, such as when you have a complex algorithm or an unusual construct that isn't immediately obvious.

On the naming front, your names are unnecessarily long. I would stick with far less descriptive names, and also make them more generic. Instead of inverseSortedCharacterDictionary, try just invertedDict. That's all that method does, it inverts a dict. It doesn't actually matter if it's passed a sorted character dict or any other type of dict.

As a rule of thumb, try to use the most generic names possible so that your methods and variables can be as generic as possible. More generic means more reusable.

characters = defaultdict(int)

def countCharacters(string):
    for ch in string:
        characters[ch] += 1

def sortedCharacters(characters):
    return sorted(characters.iteritems(), key=itemgetter(1))

def invertedDict(d):
    return dict((v, k) for k, v in d.iteritems())

Reduce volume

Using temporary variables and helper methods is a good programming practice, and I applaud you for doing so in your program. However, now that we have them simple enough that each one is only one or two lines we probably don't even need them any more.

Here's your program body after changing the functions as above:

f = open('funkymess.txt', 'r')

for line in f:
    countCharacters(line.rstrip('\n'))

f.close()

print sortedCharacters(characters)[0]

And then let's just go ahead and inline those helper methods since they're so simple. Here's the final program after all the refactoring:

Final program

#!/usr/bin/env python

from operator import itemgetter
from collections import defaultdict

characters = defaultdict(int)

f = open('funkymess.txt','r')

for line in f:
    for ch in line.rstrip('\n'):
        characters[ch] += 1

f.close()

print sorted(characters.iteritems(), key=itemgetter(1))[0]
like image 79
John Kugelman Avatar answered Dec 20 '22 12:12

John Kugelman