I have a list of item, where I want to remove the occurrence of any duplicates for one item, but keep any duplicates for the rest. I.e. I start with the following list
mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9]
I want to remove any duplicates of 0
but keep the duplicates of 1
and 9
.
My current solution is the following:
mylist = [i for i in mylist if i != 0]
mylist.add(0)
Is there a nice way of keeping one occurrence of 0
besides the following?
for i in mylist:
if mylist.count(0) > 1:
mylist.remove(0)
The second approach takes more than double the time for this example.
Clarification:
currently, I don't care about the order of items in the list, as I currently sort it after it has been created and cleaned, but that might change later.
currently, I only need to remove duplicates for one specific item (that is 0
in my example)
The solution:
[0] + [i for i in mylist if i]
looks good enough, except if 0
is not in mylist
, in which case you're wrongly adding 0.
Besides, adding 2 lists like this isn't very good performance wise. I'd do:
newlist = [i for i in mylist if i]
if len(newlist) != len(mylist): # 0 was removed, add it back
newlist.append(0)
(or using filter newlist = list(filter(None,mylist))
which could be slightly faster because there are no native python loops)
Appending to a list at the last position is very efficient (list
object uses pre-allocation and most of the time no memory is copied). The length test trick is O(1)
and allows to avoid to test 0 in mylist
It sounds like a better data structure for you to use would be collections.Counter
(which is in the standard library):
import collections
counts = collections.Counter(mylist)
counts[0] = 1
mylist = list(counts.elements())
Here is a generator-based approach with approximately O(n) complexity that also preserves the order of the original list:
In [62]: def remove_dup(lst, item):
...: temp = [item]
...: for i in lst:
...: if i != item:
...: yield i
...: elif i == item and temp:
...: yield temp.pop()
...:
In [63]: list(remove_dup(mylist, 0))
Out[63]: [4, 1, 2, 6, 1, 0, 9, 8, 9]
Also if you are dealing with larger lists you can use following vectorized and optimized approach using Numpy:
In [80]: arr = np.array([4, 1, 2, 6, 1, 0, 9, 8, 0, 9])
In [81]: mask = arr == 0
In [82]: first_ind = np.where(mask)[0][0]
In [83]: mask[first_ind] = False
In [84]: arr[~mask]
Out[84]: array([4, 1, 2, 6, 1, 0, 9, 8, 9])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With