Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicate rows of a numpy array [duplicate]

Tags:

python

numpy

How can I remove duplicate rows of a 2 dimensional numpy array?

data = np.array([[1,8,3,3,4],                  [1,8,9,9,4],                  [1,8,3,3,4]]) 

The answer should be as follows:

ans = array([[1,8,3,3,4],              [1,8,9,9,4]]) 

If there are two rows that are the same, then I would like to remove one "duplicate" row.

like image 881
Roman Avatar asked Jun 28 '15 07:06

Roman


People also ask

How do you remove duplicate records from an array?

To remove duplicates from an array: First, convert an array of duplicates to a Set . The new Set will implicitly remove duplicate elements. Then, convert the set back to an array.

How do I remove duplicate rows from a dataset in Python?

You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows.


1 Answers

You can use numpy unique. Since you want the unique rows, we need to put them into tuples:

import numpy as np  data = np.array([[1,8,3,3,4],                  [1,8,9,9,4],                  [1,8,3,3,4]]) 

just applying np.unique to the data array will result in this:

>>> uniques array([1, 3, 4, 8, 9]) 

prints out the unique elements in the list. So putting them into tuples results in:

new_array = [tuple(row) for row in data] uniques = np.unique(new_array) 

which prints:

>>> uniques array([[1, 8, 3, 3, 4],        [1, 8, 9, 9, 4]]) 

UPDATE

In the new version, you need to set np.unique(data, axis=0)

like image 149
Srivatsan Avatar answered Oct 02 '22 23:10

Srivatsan