How can I remove duplicate rows of a 2 dimensional <code>numpy</code> array? <pre class="prettyprint"><code>data = np.array([[1,8,3,3,4], [1,8,9,9,4], [1,8,3,3,4]]) </code></pre> The answer should be as follows: <pre class="prettyprint"><code>ans = array([[1,8,3,3,4], [1,8,9,9,4]]) </code></pre> If there are two rows that are the same, then I would like to remove one "duplicate" row.

You can use <code>numpy unique</code>. Since you want the unique rows, we need to put them into tuples: <pre class="prettyprint"><code>import numpy as np data = np.array([[1,8,3,3,4], [1,8,9,9,4], [1,8,3,3,4]]) </code></pre> just applying <code>np.unique</code> to the <code>data</code> array will result in this: <pre class="prettyprint"><code>>>> uniques array([1, 3, 4, 8, 9]) </code></pre> prints out the unique elements in the list. So putting them into tuples results in: <pre class="prettyprint"><code>new_array = [tuple(row) for row in data] uniques = np.unique(new_array) </code></pre> which prints: <pre class="prettyprint"><code>>>> uniques array([[1, 8, 3, 3, 4], [1, 8, 9, 9, 4]]) </code></pre> UPDATE In the new version, you need to set <code>np.unique(data, axis=0)</code>

Remove duplicate rows of a numpy array [duplicate]

Tags:

python

numpy

How can I remove duplicate rows of a 2 dimensional numpy array?

data = np.array([[1,8,3,3,4],                  [1,8,9,9,4],                  [1,8,3,3,4]])

The answer should be as follows:

ans = array([[1,8,3,3,4],              [1,8,9,9,4]])

If there are two rows that are the same, then I would like to remove one "duplicate" row.

881

asked Jun 28 '15 07:06

Roman

1 Answers

You can use numpy unique. Since you want the unique rows, we need to put them into tuples:

import numpy as np  data = np.array([[1,8,3,3,4],                  [1,8,9,9,4],                  [1,8,3,3,4]])

just applying np.unique to the data array will result in this:

>>> uniques array([1, 3, 4, 8, 9])

prints out the unique elements in the list. So putting them into tuples results in:

new_array = [tuple(row) for row in data] uniques = np.unique(new_array)

which prints:

>>> uniques array([[1, 8, 3, 3, 4],        [1, 8, 9, 9, 4]])

UPDATE

In the new version, you need to set np.unique(data, axis=0)

149

answered Oct 02 '22 23:10

Srivatsan

Related questions
                            
                                directory path types with argparse
                            
                                pandas concat generates nan values
                            
                                Non blocking subprocess.call
                            
                                Flask jsonify a list of objects
                            
                                How to limit the size of a dictionary?
                            
                                Pandas sort by group aggregate and column
                            
                                Python - a bytes like object is required, not str
                            
                                Rolling Mean on pandas on a specific column
                            
                                Performant cartesian product (CROSS JOIN) with pandas
                            
                                Run subprocess and print output to logging
                            
                                How to add element in Python to the end of list using list.insert?
                            
                                How to get the format of image with PIL?
                            
                                How do I list the files inside a python wheel?
                            
                                difference between objects.create() and object.save() in django orm
                            
                                Write xml file using lxml library in Python
                            
                                How to spread a python array [duplicate]
                            
                                Which key/value store is the most promising/stable?
                            
                                Plotting a decision boundary separating 2 classes using Matplotlib's pyplot
                            
                                Why can't I use the method __cmp__ in Python 3 as for Python 2?
                            
                                How can I compare a date and a datetime in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With