Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count letter frequency in word list, excluding duplicates in the same word

I'm trying to find the most frequent letter in a list of words. I'm struggling with the algorithm because I need to count the letter frequency in a word only once skipping duplicates, so I need help finding a way to count the frequency of the letters in the entire list with only one occurrence per word, ignoring the second occurrence.

For example if i have:

words = ["tree", "bone", "indigo", "developer"] 

The frequency will be:

letters={a:0, b:1, c:0, d:2, e:3, f:0, g:1, h:0, i:1, j:0, k:0, l:1, m:0, n:2, o:3, p:1, q:0, r:2, s:0, t:1, u:0, v:1, w:0, x:0, y:0, z:0} 

As you can see from the letters dictionary: 'e' is 3 and not 5 because if 'e' repeats more than once in the same word it should be ignored.

This is the algorithm that I came up with, it's implemented in Python:

for word in words:     count=0;      for letter in word:         if(letter.isalpha()):             if((letters[letter.lower()] > 0  && count == 0) ||                (letters[letter.lower()] == 0 && count == 0)):                      letters[letter.lower()]+=1                     count=1              elif(letters[letter.lower()]==0 && count==1):                    letters[letter.lower()]+=1 

But it still requires work and I can't think about anything else, I'd be glad to anyone who will help me to think about a working solution.

like image 519
MattGeek Avatar asked Jan 16 '19 19:01

MattGeek


2 Answers

A variation on @Primusa answer without using update:

from collections import Counter  words = ["tree", "bone", "indigo", "developer"] counts = Counter(c for word in words for c in set(word.lower()) if c.isalpha()) 

Output

Counter({'e': 3, 'o': 3, 'r': 2, 'd': 2, 'n': 2, 'p': 1, 'i': 1, 'b': 1, 'v': 1, 'g': 1, 'l': 1, 't': 1}) 

Basically convert each word to a set and then iterate over each set.

like image 105
Dani Mesejo Avatar answered Sep 21 '22 17:09

Dani Mesejo


Create a counter object and then update it with sets for each word:

from collections import Counter  wordlist = ["tree","bone","indigo","developer"]  c = Counter() for word in wordlist:     c.update(set(word.lower()))  print(c) 

Output:

Counter({'e': 3, 'o': 3, 'r': 2, 'n': 2, 'd': 2, 't': 1, 'b': 1, 'i': 1, 'g': 1, 'v': 1, 'p': 1, 'l': 1}) 

Note that although letters that weren't present in wordlist aren't present in in the Counter, this is fine because a Counter behaves like a defaultdict(int), so accessing a value not present automatically returns a default value of 0.

like image 34
Primusa Avatar answered Sep 21 '22 17:09

Primusa