Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting the number of unique words in a list

Using the following code from https://stackoverflow.com/a/11899925, I am able to find if a word is unique or not (by comparing if it was used once or greater than once):

helloString = ['hello', 'world', 'world']
count = {}
for word in helloString :
   if word in count :
      count[word] += 1
   else:
      count[word] = 1

But, if I were to have a string with hundreds of words, how would I be able to count the number of unique words within that string?

For example, my code has:

uniqueWordCount = 0
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
count = {}
for word in words :
   if word in count :
      count[word] += 1
   else:
      count[word] = 1

How would I be able to set uniqueWordCount to 6? Usually, I am really good at solving these types of algorithmic puzzles, but I have been unsuccessful with figuring this one out. I feel as if it is right beneath my nose.

like image 408
Justin Bush Avatar asked Nov 28 '22 08:11

Justin Bush


1 Answers

The best way to solve this is to use the set collection type. A set is a collection in which all elements are unique. Therefore:

unique = set([ 'one', 'two', 'two']) 
len(unique) # is 2

You can use a set from the outset, adding words to it as you go:

unique.add('three')

This will throw out any duplicates as they are added. Or, you can collect all the elements in a list and pass the list to the set() function, which will remove the duplicates at that time. The example I provided above shows this pattern:

unique = set([ 'one', 'two', 'two'])
unique.add('three')

# unique now contains {'one', 'two', 'three'}

Read more about sets in Python.

like image 85
Matthew MacGregor Avatar answered Dec 05 '22 15:12

Matthew MacGregor