I've got a <code>ndarray</code> of floating point values in numpy and I want to find the unique values of this array. Of course, this has problems because of floating point accuracy...so I want to be able to set a delta value to use for the comparisons when working out which elements are unique. Is there a way to do this? At the moment I am simply doing: <pre class="prettyprint"><code>unique(array) </code></pre> Which gives me something like: <pre class="prettyprint"><code>array([ -Inf, 0.62962963, 0.62962963, 0.62962963, 0.62962963, 0.62962963]) </code></pre> where the values that look the same (to the number of decimal places being displayed) are obviously slightly different.

Another possibility is to just round to the nearest desirable tolerance: <pre class="prettyprint"><code>np.unique(a.round(decimals=4)) </code></pre> where <code>a</code> is your original array. Edit: Just to note that my solution and @unutbu's are nearly identical speed-wise (mine is maybe 5% faster) according to my timings, so either is a good solution. Edit #2: This is meant to address Paul's concern. It is definitely slower and there may be some optimizations one can make, but I'm posting it as-is to demonstrate the stratgey: <pre class="prettyprint"><code>def eclose(a,b,rtol=1.0000000000000001e-05, atol=1e-08): return np.abs(a - b) <= (atol + rtol * np.abs(b)) x = np.array([6.4,6.500000001, 6.5,6.51]) y = x.flat.copy() y.sort() ci = 0 U = np.empty((0,),dtype=y.dtype) while ci < y.size: ii = eclose(y[ci],y) mi = np.max(ii.nonzero()) U = np.concatenate((U,[y[mi]])) ci = mi + 1 print U </code></pre> This should be decently fast if there are many repeated values within the precision range, but if many of the values are unique, then this is going to be slow. Also, it may be better to set <code>U</code> up as a list and append through the while loop, but that falls under 'further optimization'.

Find unique elements of floating point array in numpy (with comparison using a delta value)

Tags:

python

floating-point

numpy

I've got a ndarray of floating point values in numpy and I want to find the unique values of this array. Of course, this has problems because of floating point accuracy...so I want to be able to set a delta value to use for the comparisons when working out which elements are unique.

Is there a way to do this? At the moment I am simply doing:

unique(array)

Which gives me something like:

array([       -Inf,  0.62962963,  0.62962963,  0.62962963,  0.62962963,     0.62962963])

where the values that look the same (to the number of decimal places being displayed) are obviously slightly different.

968

asked Mar 24 '11 23:03

robintw

2 Answers

Another possibility is to just round to the nearest desirable tolerance:

np.unique(a.round(decimals=4))

where a is your original array.

Edit: Just to note that my solution and @unutbu's are nearly identical speed-wise (mine is maybe 5% faster) according to my timings, so either is a good solution.

Edit #2: This is meant to address Paul's concern. It is definitely slower and there may be some optimizations one can make, but I'm posting it as-is to demonstrate the stratgey:

def eclose(a,b,rtol=1.0000000000000001e-05, atol=1e-08):     return np.abs(a - b) <= (atol + rtol * np.abs(b))  x = np.array([6.4,6.500000001, 6.5,6.51]) y = x.flat.copy() y.sort() ci = 0  U = np.empty((0,),dtype=y.dtype)  while ci < y.size:     ii = eclose(y[ci],y)     mi = np.max(ii.nonzero())     U = np.concatenate((U,[y[mi]]))      ci = mi + 1  print U

This should be decently fast if there are many repeated values within the precision range, but if many of the values are unique, then this is going to be slow. Also, it may be better to set U up as a list and append through the while loop, but that falls under 'further optimization'.

191

answered Sep 20 '22 11:09

JoshAdel

Doesn't floor and round both fail the OP's requirement in some cases?

np.floor([5.99999999, 6.0]) # array([ 5.,  6.]) np.round([6.50000001, 6.5], 0) #array([ 7.,  6.])

The way I would do it is (and this may not be optimal (and is surely slower than other answers)) something like this:

import numpy as np TOL = 1.0e-3 a = np.random.random((10,10)) i = np.argsort(a.flat) d = np.append(True, np.diff(a.flat[i])) result = a.flat[i[d>TOL]]

Of course this method will exclude all but the largest member of a run of values that come within the tolerance of any other value, which means you may not find any unique values in an array if all values are significantly close even though the max-min is larger than the tolerance.

Here is essentially the same algorithm, but easier to understand and should be faster as it avoids an indexing step:

a = np.random.random((10,)) b = a.copy() b.sort() d = np.append(True, np.diff(b)) result = b[d>TOL]

The OP may also want to look into scipy.cluster (for a fancy version of this method) or numpy.digitize (for a fancy version of the other two methods)

answered Sep 22 '22 11:09

Paul

Related questions
                            
                                Getting realtime output from ffmpeg to be used in progress bar (PyQt4, stdout)
                            
                                Detect repetitions in string
                            
                                Comparing two generators in Python
                            
                                Python code to automate desktop activities in windows
                            
                                MissingSectionHeaderError: File contains no section headers
                            
                                Python: Quick and dirty datatypes (DTO)
                            
                                Calculating gradient with NumPy
                            
                                How to make a custom object iterable?
                            
                                python sys.exit not working in try [duplicate]
                            
                                Python 3.4 urllib.request error (http 403)
                            
                                How does Python's Twisted Reactor work?
                            
                                Confirming equality of two pandas dataframes?
                            
                                Python's equivalent to null-conditional operator introduced in C# 6
                            
                                Add a row with means of columns to pandas DataFrame
                            
                                AssertionError: col should be Column
                            
                                ValueError: Output tensors to a Model must be the output of a TensorFlow `Layer`
                            
                                ModuleNotFoundError when running script from Terminal
                            
                                What does = (equal) do in f-strings inside the expression curly brackets?
                            
                                How to type hint a generic numeric type in Python?
                            
                                Change current process environment's LD_LIBRARY_PATH

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With