Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

removing duplicate entries from multi-d array in python

Tags:

python

arrays

I have a 2-d array

 xx=[[a,1],[b,2],[c,3]]

Now I'm trying to remove duplicate entries from it. For simple 1-D array, simple code like

xx=list(set(xx))

would work. But trying set on 2-d elements gives an error

temp = set(xx)
TypeError: unhashable type: 'list'

One workaround would be to serialize xx elements, and then do a list(set()) on new array and then unserialize all the elements back again.

Is there any solution in python?

like image 230
Neo Avatar asked Sep 29 '10 06:09

Neo


2 Answers

Convert elements to tuple and then use set.

>>> xx=[['a',1],['b',2],['c',3],['c',3]]
>>> set(tuple(element) for element in xx)
set([('a', 1), ('b', 2), ('c', 3)])
>>> 

Tuples, unlike lists, can be hashed. Hence. And once you are done, convert the elements back to list. Putting everything together:

>>> [list(t) for t in set(tuple(element) for element in xx)]
[['a', 1], ['b', 2], ['c', 3]]
like image 161
Manoj Govindan Avatar answered Sep 17 '22 18:09

Manoj Govindan


One year after the excellent answer of Manoj Govindan, I'm adding my piece of advice:

Floating points numbers are just a pain if you want to compare things...

For instance,

>>>0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1 == 0.1*10

False

That's because your computer can't accurately represent decimal floating points as binary numbers (computers handle binary/base 2 numbers only, not decimals/base 10).

So be really careful when comparing floats!

like image 26
Jerry Avatar answered Sep 17 '22 18:09

Jerry