Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add only unique values to a list in python

Tags:

python

list

I'm trying to learn python. Here is the relevant part of the exercise:

For each word, check to see if the word is already in a list. If the word is not in the list, add it to the list.

Here is what I've got.

fhand = open('romeo.txt') output = []  for line in fhand:     words = line.split()     for word in words:         if word is not output:             output.append(word)  print sorted(output) 

Here is what I get.

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and',  'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is',  'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun',  'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder'] 

Note duplication (and, is, sun, etc).

How do I get only unique values?

like image 940
Tim Elhajj Avatar asked Feb 19 '17 23:02

Tim Elhajj


People also ask

How do I add unique values to a list?

Method 2: Using Set Using set() property of Python, we can easily check for the unique values. Insert the values of the list in a set. Set only stores a value once even if it is inserted more than once. After inserting all the values in the set by list_set=set(list1), convert this set to a list to print it.

What does unique () do in Python?

unique() function. The unique() function is used to find the unique elements of an array. Returns the sorted unique elements of an array.

Which has only unique values in Python?

With Set. A set only contains unique values.


2 Answers

To eliminate duplicates from a list, you can maintain an auxiliary list and check against.

myList = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and',       'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light',       'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the',       'through', 'what', 'window', 'with', 'yonder']  auxiliaryList = [] for word in myList:     if word not in auxiliaryList:         auxiliaryList.append(word) 

output:

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east',    'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick',   'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder'] 

This is very simple to comprehend and code is self explanatory. However, code simplicity comes on the expense of code efficiency as linear scans over a growing list makes a linear algorithm degrade to quadratic.


If the order is not important, you could use set()

A set object is an unordered collection of distinct hashable objects.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

Since the average case for membership checking in a hash-table is O(1), using a set is more efficient.

auxiliaryList = list(set(myList)) 

output:

['and', 'envious', 'already', 'fair', 'is', 'through', 'pale', 'yonder',   'what', 'sun', 'Who', 'But', 'moon', 'window', 'sick', 'east', 'breaks',   'grief', 'with', 'light', 'It', 'Arise', 'kill', 'the', 'soft', 'Juliet'] 
like image 152
Tony Tannous Avatar answered Sep 27 '22 19:09

Tony Tannous


Instead of is not operator, you should use not in operator to check whether the item is in the list:

if word not in output: 

BTW, using set is a lot efficient (See Time complexity):

with open('romeo.txt') as fhand:     output = set()     for line in fhand:         words = line.split()         output.update(words) 

UPDATE The set does not preserve the original order. To preserve the order, use the set as an auxiliary data structure:

output = [] seen = set() with open('romeo.txt') as fhand:     for line in fhand:         words = line.split()         for word in words:             if word not in seen:  # faster than `word not in output`                 seen.add(word)                 output.append(word) 
like image 27
falsetru Avatar answered Sep 27 '22 18:09

falsetru