Removing rows with duplicates in a NumPy array

Tags:

I have a (N,3) array of numpy values:

>>> vals = numpy.array([[1,2,3],[4,5,6],[7,8,7],[0,4,5],[2,2,1],[0,0,0],[5,4,3]])
>>> vals
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 7],
       [0, 4, 5],
       [2, 2, 1],
       [0, 0, 0],
       [5, 4, 3]])

I'd like to remove rows from the array that have a duplicate value. For example, the result for the above array should be:

>>> duplicates_removed
array([[1, 2, 3],
       [4, 5, 6],
       [0, 4, 5],
       [5, 4, 3]])

I'm not sure how to do this efficiently with numpy without looping (the array could be quite large). Anyone know how I could do this?

999

asked Sep 15 '11 23:09

jterrace

1 Answers

This is an option:

import numpy
vals = numpy.array([[1,2,3],[4,5,6],[7,8,7],[0,4,5],[2,2,1],[0,0,0],[5,4,3]])
a = (vals[:,0] == vals[:,1]) | (vals[:,1] == vals[:,2]) | (vals[:,0] == vals[:,2])
vals = numpy.delete(vals, numpy.where(a), axis=0)

110

answered Oct 26 '22 09:10

Benjamin

Related questions
                            
                                Contributing to Python
                            
                                Creating an interactive shell for .NET apps and embed scripting languages like python/iron python into it
                            
                                Jump into a Python Interactive Session mid-program?
                            
                                Force another program's standard output to be unbuffered using Python
                            
                                Unicode filenames on Windows with Python & subprocess.Popen()
                            
                                What's the __repr__ equivalence in ruby?
                            
                                pysqlite2: ProgrammingError - You must not use 8-bit bytestrings
                            
                                How to fix this python error? OverflowError: cannot convert float infinity to integer
                            
                                Non-sequential substitution in SymPy
                            
                                Python and ElementTree: return "inner XML" excluding parent element
                            
                                How to setup FTS3/FTS4 with python2.7 on Windows
                            
                                How to keep comments while parsing XML using Python / ElementTree
                            
                                PEP 3118 warning when using ctypes array as numpy array
                            
                                A resilient, actually working CSV implementation for non-ascii?
                            
                                nose, unittest.TestCase and metaclass: auto-generated test_* methods not discovered
                            
                                How to keep submodule names out of the name space of a Python package?
                            
                                python sys.argv limitations?
                            
                                What is meant by "classes themselves are objects"?
                            
                                Assigning return value of function to a variable, with multiprocessing? And a problem about IDLE?
                            
                                How can I stop a scrapy CrawlSpider and later resume where it left-off?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Removing rows with duplicates in a NumPy array

Tags:

performance

python

vectorization

numpy

jterrace

People also ask

1 Answers

Benjamin

Recent Activity

Donate For Us