I'm trying to learn python. Here is the relevant part of the exercise:
For each word, check to see if the word is already in a list. If the word is not in the list, add it to the list.
Here is what I've got.
fhand = open('romeo.txt') output = [] for line in fhand: words = line.split() for word in words: if word is not output: output.append(word) print sorted(output)
Here is what I get.
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']
Note duplication (and, is, sun, etc).
How do I get only unique values?
Method 2: Using Set Using set() property of Python, we can easily check for the unique values. Insert the values of the list in a set. Set only stores a value once even if it is inserted more than once. After inserting all the values in the set by list_set=set(list1), convert this set to a list to print it.
unique() function. The unique() function is used to find the unique elements of an array. Returns the sorted unique elements of an array.
With Set. A set only contains unique values.
To eliminate duplicates from a list, you can maintain an auxiliary list and check against.
myList = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder'] auxiliaryList = [] for word in myList: if word not in auxiliaryList: auxiliaryList.append(word)
output:
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']
This is very simple to comprehend and code is self explanatory. However, code simplicity comes on the expense of code efficiency as linear scans over a growing list makes a linear algorithm degrade to quadratic.
If the order is not important, you could use set()
A set object is an unordered collection of distinct hashable objects.
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
Since the average case for membership checking in a hash-table is O(1), using a set is more efficient.
auxiliaryList = list(set(myList))
output:
['and', 'envious', 'already', 'fair', 'is', 'through', 'pale', 'yonder', 'what', 'sun', 'Who', 'But', 'moon', 'window', 'sick', 'east', 'breaks', 'grief', 'with', 'light', 'It', 'Arise', 'kill', 'the', 'soft', 'Juliet']
Instead of is not
operator, you should use not in
operator to check whether the item is in the list:
if word not in output:
BTW, using set
is a lot efficient (See Time complexity):
with open('romeo.txt') as fhand: output = set() for line in fhand: words = line.split() output.update(words)
UPDATE The set
does not preserve the original order. To preserve the order, use the set as an auxiliary data structure:
output = [] seen = set() with open('romeo.txt') as fhand: for line in fhand: words = line.split() for word in words: if word not in seen: # faster than `word not in output` seen.add(word) output.append(word)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With