Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A value in a list, python

Every character in the English language has a percentage of occurrence, these are the percentages:

A       B       C       D       E       F       G       H       I
.0817   .0149   .0278   .0425   .1270   .0223   .0202   .0609   .0697
J       K       L       M       N       O       P       Q       R
.0015   .0077   .0402   .0241   .0675   .0751   .0193   .0009   .0599
S       T       U       V       W       X       Y       Z   
.0633   .0906   .0276   .0098   .0236   .0015   .0197   .0007

A list called letterGoodness is predefined as:

letterGoodness = [.0817,.0149,.0278,.0425,.1270,.0223,.0202,...

I need to find the "goodness" of a string. For example the goodness of 'I EAT' is: .0697 + .1270 + .0817 + .0906 =.369. This is part of a bigger problem, but I need to solve this to solve the big problem. I started like this:

def goodness(message):
   for i in L:
     for j in i:

So it will be enough to find out how to get the occurrence percentage of any character. Can you help me? The string contains only uppercase letters and spaces.

like image 449
Reginald Avatar asked Dec 11 '22 23:12

Reginald


1 Answers

letterGoodness is better as a dictionary, then you can just do:

sum(letterGoodness.get(c,0) for c in yourstring.upper())
#                                             #^.upper for defensive programming

To convert letterGoodness from your list to a dictonary, you can do:

import string
letterGoodness = dict(zip(string.ascii_uppercase,letterGoodness))

If you're guaranteed to only have uppercase letters and spaces, you can do:

letterGoodness = dict(zip(string.ascii_uppercase,letterGoodness))
letterGoodness[' '] = 0
sum(letterGoodness[c] for c in yourstring)

but the performance gains here are probably pretty minimal so I would favor the more robust version above.


If you insist on keeping letterGoodness as a list (and I don't advise that), you can use the builtin ord to get the index (pointed out by cwallenpoole):

 ordA = ord('A')
 sum(letterGoodness[ord(c)-ordA] for c in yourstring if c in string.ascii_uppercase)

I'm too lazy to timeit right now, but you may want to also define a temporary set to hold string.ascii_uppercase -- It might make your function run a little faster (depending on how optimized str.__contains__ is compared to set.__contains__):

 ordA = ord('A')
 big_letters = set(string.ascii_uppercase)
 sum(letterGoodness[ord(c)-ordA] for c in yourstring.upper() if c in big_letters)
like image 192
mgilson Avatar answered Dec 21 '22 10:12

mgilson