Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quickest way to dedupe list in dict [duplicate]

I have a dict containing lists and need a fast way to dedupe the lists.

I know how to dedupe a list in isolation using the set() function, but in this case I want a fast way of iterating through the dict, deduping each list on the way.

hello = {'test1':[2,3,4,2,2,5,6], 'test2':[5,5,8,4,3,3,8,9]}

I'd like it to appear like;

hello = {'test1':[2,3,4,5,6], 'test2':[5,8,4,3,9]}

Though I don't necessarily need to have the original order of the lists preserved.

I've tried using a set like this, but it's not quite correct (it's not iterating properly and I'm losing the first key)

for key, value in hello.items(): goodbye = {key: set(value)}
>>> goodbye
{'test2': set([8, 9, 3, 4, 5])}

EDIT: Following PM 2Ring's comment below, I'm now populating the dict differently to avoid duplicates in the first place. Previously I was using lists, but using sets prevents dupes to be appended by default;

>>> my_numbers = {}
>>> my_numbers['first'] = [1,2,2,2,6,5]
>>> from collections import defaultdict
>>> final_list = defaultdict(set)
>>> for n in my_numbers['first']: final_list['test_first'].add(n)
... 
>>> final_list['test_first']
set([1, 2, 5, 6])

As you can see, the final output is a deduped set, as required.

like image 372
John Honan Avatar asked Jul 17 '15 15:07

John Honan


Video Answer


1 Answers

It's not iterating wrong, you're just assigning goodbye as a new dict each time. You need to assign as an empty dict then assign the values to keys in each iteration.

goodbye = {}
for key, value in hello.items(): goodbye[key] = set(value)
>>> goodbye
{'test1': set([2, 3, 4, 5, 6]), 'test2': set([8, 9, 3, 4, 5])}

Also since sets don't preserve order, if you do want to preserve order it's best to make a simple iterating function that will return a new list that skips over already added values.

def uniqueList(li):
    newList = []
    for x in li:
        if x not in newList:
            newList.append(x)
    return newList


goodbye = {}
for key, value in hello.items(): goodbye[key] = uniqueList(value)
>>> goodbye
{'test1': [2, 3, 4, 5, 6], 'test2': [5, 8, 4, 3, 9]}
like image 60
SuperBiasedMan Avatar answered Nov 12 '22 00:11

SuperBiasedMan