Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing rows with duplicates in a NumPy array

I have a (N,3) array of numpy values:

>>> vals = numpy.array([[1,2,3],[4,5,6],[7,8,7],[0,4,5],[2,2,1],[0,0,0],[5,4,3]])
>>> vals
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 7],
       [0, 4, 5],
       [2, 2, 1],
       [0, 0, 0],
       [5, 4, 3]])

I'd like to remove rows from the array that have a duplicate value. For example, the result for the above array should be:

>>> duplicates_removed
array([[1, 2, 3],
       [4, 5, 6],
       [0, 4, 5],
       [5, 4, 3]])

I'm not sure how to do this efficiently with numpy without looping (the array could be quite large). Anyone know how I could do this?

like image 999
jterrace Avatar asked Sep 15 '11 23:09

jterrace


People also ask

How do you remove repeats from an array in Python?

Method 1: Using *set() This is the fastest and smallest method to achieve a particular task. It first removes the duplicates and returns a dictionary which has to be converted to list.

How do I remove duplicate rows from a dataset in Python?

You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows.

How do you remove duplicates from an array?

We can remove duplicate element in an array by 2 ways: using temporary array or using separate index. To remove the duplicate element from array, the array must be in sorted order. If array is not sorted, you can sort it by calling Arrays. sort(arr) method.


1 Answers

This is an option:

import numpy
vals = numpy.array([[1,2,3],[4,5,6],[7,8,7],[0,4,5],[2,2,1],[0,0,0],[5,4,3]])
a = (vals[:,0] == vals[:,1]) | (vals[:,1] == vals[:,2]) | (vals[:,0] == vals[:,2])
vals = numpy.delete(vals, numpy.where(a), axis=0)
like image 110
Benjamin Avatar answered Oct 26 '22 09:10

Benjamin