I have a simple question about the <code>fix</code> and <code>floor</code> functions in <code>numpy</code>. When rounding negative numbers that are larger than -1 towards zero, <code>numpy</code> round them off correctly to zero however leaves a negative sign. This negative sign interferes with my costume unique_rows function since it uses the <code>ascontiguousarray</code> to compare elements of the array and this sign disturbs the uniqueness. Both round and fix behave the same in this regard. <pre class="prettyprint"><code>>>> np.fix(-1e-6) Out[1]: array(-0.0) >>> np.round(-1e-6) Out[2]: -0.0 </code></pre> Any insights on how to get rid of the sign? I thought about using the <code>np.sign</code> function but it comes with extra computational cost.

The issue you're having between <code>-0.</code> and <code>+0.</code> is part of the specification of how floats are supposed to behave (IEEE754). In some circumstance, one needs this distinction. See, for example, the docs that are linked to in the docs for <code>around</code>. It's also worth noting that the two zeros should compare to equal, so <pre class="prettyprint"><code>np.array(-0.)==np.array(+0.) # True </code></pre> That is, I think the problem is more likely with your uniqueness comparison. For example: <pre class="prettyprint"><code>a = np.array([-1., -0., 0., 1.]) np.unique(a) # array([-1., -0., 1.]) </code></pre> If you want to keep the numbers as floating point but have all the zeros the same, you could use: <pre class="prettyprint"><code>x = np.linspace(-2, 2, 6) # array([-2. , -1.2, -0.4, 0.4, 1.2, 2. ]) y = x.round() # array([-2., -1., -0., 0., 1., 2.]) y[y==0.] = 0. # array([-2., -1., 0., 0., 1., 2.]) # or y += 0. # array([-2., -1., 0., 0., 1., 2.]) </code></pre> Note, though, you do have to do this bit of extra work since you are trying to avoid the floating point specification. Note also that this isn't due to a rounding error. For example, <pre class="prettyprint"><code>np.fix(np.array(-.4)).tostring().encode('hex') # '0000000000000080' np.fix(np.array(-0.)).tostring().encode('hex') # '0000000000000080' </code></pre> That is, the resulting numbers are exactly the same, but <pre class="prettyprint"><code>np.fix(np.array(0.)).tostring().encode('hex') # '0000000000000000' </code></pre> is different. This is why your method is not working, since it's comparing the binary representation of the numbers, which is different for the two zeros. Therefore, I think the problem is more the method of comparison than the general idea of comparing floating point numbers for uniqueness. A quick timeit test for the various approaches: <pre class="prettyprint"><code>data0 = np.fix(4*np.random.rand(1000000,)-2) # [ 1. -0. 1. -0. -0. 1. 1. 0. -0. -0. .... ] N = 100 data = np.array(data0) print timeit.timeit("data += 0.", setup="from __main__ import np, data", number=N) # 0.171831846237 data = np.array(data0) print timeit.timeit("data[data==0.] = 0.", setup="from __main__ import np, data", number=N) # 0.83500289917 data = np.array(data0) print timeit.timeit("data.astype(np.int).astype(np.float)", setup="from __main__ import np, data", number=N) # 0.843791007996 </code></pre> I agree with @senderle's point that if you want simple and exact comparisons and can get by with ints, ints will generally be easier. But if you want unique floats, you should be able to do this too, though you need to do it a bit more carefully. The main issue with floats is that you can have small differences that can be introduced from calculations and don't appear in a normal <code>print</code>, but this isn't an huge barrier and especially not after a <code>round, fix, rint</code> for a reasonable range of floats.

I think the fundamental problem is that you're using set-like operations on floating-point numbers -- which is something to avoid as a general rule, unless you have a very good reason and a deep understanding of floating-point numbers. The obvious reason to follow this rule is that even a very small difference between two floats registers as an absolute difference, so numerical error can cause set-like operations to produce unexpected results. Now, in your use case, it might initially seem that you've avoided that problem by rounding first, thereby limiting the range of possible values. But it turns out that unexpected results are still possible, as this corner case shows. Floating-point numbers are hard to reason about. I think the correct fix is to round and then to convert to <code>int</code> using <code>astype</code>. <pre class="prettyprint"><code>>>> a array([-0.5, 2. , 0.2, -3. , -0.2]) >>> numpy.fix(a) array([-0., 2., 0., -3., -0.]) >>> numpy.fix(a).astype(int) # could also use 'i8', etc... array([ 0, 2, 0, -3, 0]) </code></pre> Since you're already rounding, this shouldn't throw away any information, and it will be more stable and predictable for set-like operations later. This is one of those cases where it's best to use the correct abstraction! If you need floats, you can always convert back. The only problem with this is that it creates another copy; but most of the time that's not really a problem. <code>numpy</code> is fast enough that the overhead of copying is pretty tiny! I'll add that if your case really demands the use of floats, then tom10's answer is a good one. But I feel that the number of cases in which both floats and set-like operations are genuinely necessary is very small.

How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy?

Tags:

python

rounding

unique

numpy

I have a simple question about the fix and floor functions in numpy. When rounding negative numbers that are larger than -1 towards zero, numpy round them off correctly to zero however leaves a negative sign. This negative sign interferes with my costume unique_rows function since it uses the ascontiguousarray to compare elements of the array and this sign disturbs the uniqueness. Both round and fix behave the same in this regard.

>>> np.fix(-1e-6)
Out[1]: array(-0.0)
>>> np.round(-1e-6)
Out[2]: -0.0

Any insights on how to get rid of the sign? I thought about using the np.sign function but it comes with extra computational cost.

361

asked Nov 06 '14 14:11

Arash_D_B

2 Answers

The issue you're having between -0. and +0. is part of the specification of how floats are supposed to behave (IEEE754). In some circumstance, one needs this distinction. See, for example, the docs that are linked to in the docs for around.

It's also worth noting that the two zeros should compare to equal, so

np.array(-0.)==np.array(+0.) 
# True

That is, I think the problem is more likely with your uniqueness comparison. For example:

a = np.array([-1., -0., 0., 1.])
np.unique(a)
#  array([-1., -0.,  1.])

If you want to keep the numbers as floating point but have all the zeros the same, you could use:

x = np.linspace(-2, 2, 6)
#  array([-2. , -1.2, -0.4,  0.4,  1.2,  2. ])
y = x.round()
#  array([-2., -1., -0.,  0.,  1.,  2.])
y[y==0.] = 0.
#  array([-2., -1.,  0.,  0.,  1.,  2.])

# or  
y += 0.
#  array([-2., -1.,  0.,  0.,  1.,  2.])

Note, though, you do have to do this bit of extra work since you are trying to avoid the floating point specification.

Note also that this isn't due to a rounding error. For example,

np.fix(np.array(-.4)).tostring().encode('hex')
# '0000000000000080'
np.fix(np.array(-0.)).tostring().encode('hex')
# '0000000000000080'

That is, the resulting numbers are exactly the same, but

np.fix(np.array(0.)).tostring().encode('hex')
# '0000000000000000'

is different. This is why your method is not working, since it's comparing the binary representation of the numbers, which is different for the two zeros. Therefore, I think the problem is more the method of comparison than the general idea of comparing floating point numbers for uniqueness.

A quick timeit test for the various approaches:

data0 = np.fix(4*np.random.rand(1000000,)-2)
#   [ 1. -0.  1. -0. -0.  1.  1.  0. -0. -0. .... ]

N = 100
data = np.array(data0)
print timeit.timeit("data += 0.", setup="from __main__ import np, data", number=N)
#  0.171831846237
data = np.array(data0)
print timeit.timeit("data[data==0.] = 0.", setup="from __main__ import np, data", number=N)
#  0.83500289917
data = np.array(data0)
print timeit.timeit("data.astype(np.int).astype(np.float)", setup="from __main__ import np, data", number=N)
#  0.843791007996

I agree with @senderle's point that if you want simple and exact comparisons and can get by with ints, ints will generally be easier. But if you want unique floats, you should be able to do this too, though you need to do it a bit more carefully. The main issue with floats is that you can have small differences that can be introduced from calculations and don't appear in a normal print, but this isn't an huge barrier and especially not after a round, fix, rint for a reasonable range of floats.

117

answered Oct 08 '22 13:10

tom10

I think the fundamental problem is that you're using set-like operations on floating-point numbers -- which is something to avoid as a general rule, unless you have a very good reason and a deep understanding of floating-point numbers.

The obvious reason to follow this rule is that even a very small difference between two floats registers as an absolute difference, so numerical error can cause set-like operations to produce unexpected results. Now, in your use case, it might initially seem that you've avoided that problem by rounding first, thereby limiting the range of possible values. But it turns out that unexpected results are still possible, as this corner case shows. Floating-point numbers are hard to reason about.

I think the correct fix is to round and then to convert to int using astype.

>>> a
array([-0.5,  2. ,  0.2, -3. , -0.2])
>>> numpy.fix(a)
array([-0.,  2.,  0., -3., -0.])
>>> numpy.fix(a).astype(int)    # could also use 'i8', etc...
array([ 0,  2,  0, -3,  0])

Since you're already rounding, this shouldn't throw away any information, and it will be more stable and predictable for set-like operations later. This is one of those cases where it's best to use the correct abstraction!

If you need floats, you can always convert back. The only problem with this is that it creates another copy; but most of the time that's not really a problem. numpy is fast enough that the overhead of copying is pretty tiny!

I'll add that if your case really demands the use of floats, then tom10's answer is a good one. But I feel that the number of cases in which both floats and set-like operations are genuinely necessary is very small.

answered Oct 08 '22 12:10

senderle

Related questions
                            
                                Preserve code readability while optimising
                            
                                What are the u's when I use json.loads? [duplicate]
                            
                                Python - Twisted, Proxy and modifying content
                            
                                How does PyArg_ParseTupleAndKeywords work?
                            
                                Django-tastypie: Any example on file upload in POST?
                            
                                Strange behaviour with floats and string conversion
                            
                                Generate python bindings, what methods/programs to use [closed]
                            
                                Python - Find text using beautifulSoup then replace in original soup variable
                            
                                how to check which compiler was used to build Python
                            
                                Stopping Supervisor doesn't stop Celery workers
                            
                                What is the use of __kwdefaults__ which is a function object attribute?
                            
                                Sharing a lock between gunicorn workers
                            
                                Setting DataFrame column headers to a MultiIndex
                            
                                Django: list all reverse relations of a model
                            
                                remove italics in latex subscript in matplotlib
                            
                                fixing words with spaces using a dictionary look up in python?
                            
                                How to place minor ticks on symlog scale?
                            
                                Where in flask/gunicorn to initialize application
                            
                                Use cases for property vs. descriptor vs. __getattribute__
                            
                                Get count of related model efficiently in Django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With