Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

finding duplicates in a list of lists

I am using Python 2.7 and am trying to de-duplicate a list of lists and merge the values of the duplicates.

Right now I have:

original_list = [['a', 1], ['b', 1], ['a', 1], ['b', 1], ['b', 2], ['c', 2], ['b', 3]] 

I want to match on the first element of each nested list and then add the values of the second element. I want to end up with this (the order of the final list does not matter):

ideal_output = [['a', 2], ['b', 7], ['c', 2]] 

So far I have some code that will find me the duplicate values based on the first element of each nested list:

for item in original_list:     matches = -1     for x in original_list:         if (item[0] == x[0]):             matches += 1     if matches >= 1:          if item[0] not in duplicates_list:             duplicates_list.append(item[0]) 

From here I need to search for all duplicates_list items that are in original_list and add up the values, but I am not sure what the best way to do that is.

like image 335
e h Avatar asked Nov 06 '13 11:11

e h


People also ask

Can list have duplicates in Python?

Python list can contain duplicate elements.


2 Answers

Lots of good answers, but they all use rather more code than I would for this, so here's my take, for what it's worth:

totals = {} for k,v in original_list:   totals[k] = totals.get(k,0) + v  # totals = {'a': 2, 'c': 2, 'b': 7} 

Once you have a dict like that, from any of these answers, you can use items to get a(n object that acts like a) list of tuples:

totals.items() # => dict_items([('a', 2), ('c', 2), ('b', 7)]) 

And run list across the tuples to get a list of lists:

[list(t) for t in totals.items()] # => [['a', 2], ['c', 2], ['b', 7]] 

And sort if you want them in order:

sorted([list(t) for t in totals.items()]) # => [['a', 2], ['b', 7], ['c', 2]]   
like image 193
Mark Reed Avatar answered Oct 15 '22 01:10

Mark Reed


>>> from collections import Counter >>> lst = [['a', 1], ['b', 1], ['a', 1], ['b', 1], ['b', 2], ['c', 2], ['b', 3]] >>> c = Counter(x for x, c in lst for _ in xrange(c))  Counter({'b': 7, 'a': 2, 'c': 2})  >>> map(list, c.iteritems()) [['a', 2], ['c', 2], ['b', 7]] 

Or alternatively, without repeating each item (a, b) b times (@hcwhsa):

>>> from collections import Counter >>> lst = [['a', 1], ['b', 1], ['a', 1], ['b', 1], ['b', 2], ['c', 2], ['b', 3]] >>> c = sum((Counter(**{k:v}) for k, v in lst), Counter())  Counter({'b': 7, 'a': 2, 'c': 2})  >>> map(list, c.iteritems()) [['a', 2], ['c', 2], ['b', 7]] 
like image 42
Maciej Gol Avatar answered Oct 15 '22 00:10

Maciej Gol