I have a (N,3)
array of numpy values:
>>> vals = numpy.array([[1,2,3],[4,5,6],[7,8,7],[0,4,5],[2,2,1],[0,0,0],[5,4,3]])
>>> vals
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 7],
[0, 4, 5],
[2, 2, 1],
[0, 0, 0],
[5, 4, 3]])
I'd like to remove rows from the array that have a duplicate value. For example, the result for the above array should be:
>>> duplicates_removed
array([[1, 2, 3],
[4, 5, 6],
[0, 4, 5],
[5, 4, 3]])
I'm not sure how to do this efficiently with numpy without looping (the array could be quite large). Anyone know how I could do this?
Method 1: Using *set() This is the fastest and smallest method to achieve a particular task. It first removes the duplicates and returns a dictionary which has to be converted to list.
You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows.
We can remove duplicate element in an array by 2 ways: using temporary array or using separate index. To remove the duplicate element from array, the array must be in sorted order. If array is not sorted, you can sort it by calling Arrays. sort(arr) method.
This is an option:
import numpy
vals = numpy.array([[1,2,3],[4,5,6],[7,8,7],[0,4,5],[2,2,1],[0,0,0],[5,4,3]])
a = (vals[:,0] == vals[:,1]) | (vals[:,1] == vals[:,2]) | (vals[:,0] == vals[:,2])
vals = numpy.delete(vals, numpy.where(a), axis=0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With