I have n lists of numbers. I want to make sure that each list contains unique elements to that particular list. I.e. There are no "shared" duplicates across any of the rest.
This is really easy to do with two lists, but a little trickier with n lists.
e.g.
mylist = [
[1, 2, 3, 4],
[2, 5, 6, 7],
[4, 2, 8, 9]
]
becomes:
mylist = [
[1, 3],
[5, 6, 7],
[8, 9]
]
from collections import Counter
from itertools import chain
mylist = [
[1,2,3,4],
[2,5,6,7,7],
[4,2,8,9]
]
counts = Counter(chain(*map(set,mylist)))
[[i for i in sublist if counts[i]==1] for sublist in mylist]
#[[1, 3], [5, 6, 7, 7], [8, 9]]
This does it in linear time, 2 passes. I'm assuming you want to preserve duplicates within a list; if not, this can be simplified a bit:
>>> import collections, itertools
>>> counts = collections.defaultdict(int)
>>> for i in itertools.chain.from_iterable(set(l) for l in mylist):
... counts[i] += 1
...
>>> for l in mylist:
... l[:] = (i for i in l if counts[i] == 1)
...
>>> mylist
[[1, 3], [5, 6, 7], [8, 9]]
Since you don't care about order, you can easily remove duplicates using set subtraction and converting back to list. Here it is in a monster one-liner:
>>> mylist = [
... [1, 2, 3, 4],
... [2, 5, 6, 7],
... [4, 2, 8, 9]
... ]
>>> mynewlist = [list(set(thislist) - set(element for sublist in mylist for element in sublist if sublist is not thislist)) for thislist in mylist]
>>> mynewlist
[[1, 3], [5, 6, 7], [8, 9]]
Note: This is not very efficient because duplicates are recomputed for each row. Whether this is a problem or not depends on your data size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With