Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove both duplicates in multiple lists python

I have three lists X, Y, Z as follows:

X: [1, 1, 2, 3, 4, 5, 5, 5]
Y: [3, 3, 2, 6, 7, 1, 1, 2]
Z: [0, 0, 1, 1, 2, 3, 3, 4]

I am trying to remove both duplicated set of values at the same index of the lists get a reduced list as follows, all three list will always have the same length initially and at the end as well:

X: [2, 3, 4, 5]
Y: [2, 6, 7, 2]
Z: [1, 1, 2, 4]

I tried using the zip(X, Y, Z) function but I can't index it and the dict.fromkeys only removes one of the duplicates and leaves the other in the new list. I want to be able to remove both.

Any help is appreciated!

like image 791
mb567 Avatar asked Jun 13 '18 14:06

mb567


2 Answers

Using collections.Counter and zip, you can count unique triplets.

Then remove duplicates via a generator comprehension.

from collections import Counter

X = [1, 1, 2, 3, 4, 5, 5, 5]
Y = [3, 3, 2, 6, 7, 1, 1, 2]
Z = [0, 0, 1, 1, 2, 3, 3, 4]

c = Counter(zip(X, Y, Z))

X, Y, Z = zip(*(k for k, v in c.items() if v == 1))

print(X, Y, Z, sep='\n')

(2, 3, 4, 5)
(2, 6, 7, 2)
(1, 1, 2, 4)

Note if ordering is a requirement and you are not using Python 3.6+, you can create an "OrderedCounter" instead by subclassing collections.OrderedDict.

like image 96
jpp Avatar answered Oct 16 '22 07:10

jpp


It's convenient to use pandas library for the task. Just create dataframe using the lists and apply df.drop_duplicates with keep=False (means remove all duplicated rows):

import pandas as pd

dct = {
"X": [1, 1, 2, 3, 4, 5, 5, 5],
"Y": [3, 3, 2, 6, 7, 1, 1, 2],
"Z": [0, 0, 1, 1, 2, 3, 3, 4],
}
d = pd.DataFrame(dct)
d.drop_duplicates(keep=False)
like image 24
koPytok Avatar answered Oct 16 '22 09:10

koPytok