Because numpy.arange() uses <code>ceil((stop - start)/step)</code> to determine the number of items, a small float imprecision <code>(stop = .400000001)</code> can add an unintended value to the list. <h3>Example</h3> The first case does not include the stop point (intended) <pre class="prettyprint"><code>>>> print(np.arange(.1,.3,.1)) [0.1 0.2] </code></pre> The second case includes the stop point (not intended) <pre class="prettyprint"><code>>>> print(np.arange(.1,.4,.1)) [0.1 0.2 0.3 0.4] </code></pre> numpy.linspace() fixes this problem, <code>np.linspace(.1,.4-.1,3)</code>. but requires you know the number of steps. <code>np.linspace(start,stop-step,np.ceil((stop-step)/step))</code> leads to the same incosistencies. <h3>Question</h3> How can I generate a reliable <code>float</code> range without knowing the # of elements in the range? <h3>Extreme Case</h3> Consider the case in which I want generate a float index of unknown precision <pre class="prettyprint"><code>np.arange(2.00(...)001,2.00(...)021,.00(...)001) </code></pre>

Your goal is to calculate what <code>ceil((stop - start)/step)</code> would be if the values had been calculated with exact mathematics. This is impossible to do given only floating-point values of <code>start</code>, <code>stop</code>, and <code>step</code> that are the results of operations in which some rounding errors may have occurred. Rounding removes information, and there is simply no way to create information from lack of information. Therefore, this problem is only solvable if you have additional information about <code>start</code>, <code>stop</code>, and <code>step</code>. Suppose <code>step</code> is exact, but <code>start</code> and <code>stop</code> have some accumulated errors bounded by <code>e0</code> and <code>e1</code>. That is, you know <code>start</code> is at most <code>e0</code> away from its ideal mathematical value (in either direction), and <code>stop</code> is at most <code>e1</code> away from its ideal value (in either direction). Then the ideal value of <code>(stop-start)/step</code> could range from <code>(stop-start-e0-e1)/step</code> to <code>(stop-start+e0+e1)/step</code> away from its ideal value. Suppose there is an integer between <code>(stop-start-e0-e1)/step</code> to <code>(stop-start+e0+e1)/step</code>. Then it is impossible to know whether the ideal <code>ceil</code> result should be the lesser integer or the greater just from the floating-point values of <code>start</code>, <code>stop</code>, and <code>step</code> and the bounds <code>e0</code> and <code>e1</code>. However, from the examples you have given, the ideal <code>(stop-start)/step</code> could be exactly an integer, as in <code>(.4-.1)/.1</code>. If so, any non-zero error bounds could result in the error interval straddling an integer, making the problem impossible to solve from the information we have so far. Therefore, in order to solve the problem, you must have more information than just simple bounds on the errors. You must know, for example, that <code>(stop-start)/step</code> is exactly an integer or is otherwise quantized. For example, if you knew that the ideal calculation of the number of steps would produce a multiple of .1, such as 3.8, 3.9, 4.0, 4.1, or 4.2, but never 4.05, and the errors were sufficiently small that the floating-point calculation <code>(stop-start)/step</code> had a final error less than .05, then it would be possible to round <code>(stop-start)/step</code> to the nearest qualifying multiple and then to apply <code>ceil</code> to that. If you have such information, you can update the question with what you know about the errors in <code>start</code>, <code>stop</code>, and <code>step</code> (e.g., perhaps each of them is the result of a single conversion from decimal to floating-point) and the possible values of the ideal <code>(stop-start)/step</code>. If you do not have such information, there is no solution.

How to prevent float imprecision from affecting numpy.arange?

Example

The first case does not include the stop point (intended)

>>> print(np.arange(.1,.3,.1))
[0.1 0.2]

The second case includes the stop point (not intended)

>>> print(np.arange(.1,.4,.1))
[0.1 0.2 0.3 0.4]

numpy.linspace() fixes this problem, np.linspace(.1,.4-.1,3). but requires you know the number of steps. np.linspace(start,stop-step,np.ceil((stop-step)/step)) leads to the same incosistencies.

Question

How can I generate a reliable float range without knowing the # of elements in the range?

Extreme Case

Consider the case in which I want generate a float index of unknown precision

np.arange(2.00(...)001,2.00(...)021,.00(...)001)

734

asked Feb 12 '18 21:02

Brendan Frick

1 Answers

Your goal is to calculate what ceil((stop - start)/step) would be if the values had been calculated with exact mathematics.

This is impossible to do given only floating-point values of start, stop, and step that are the results of operations in which some rounding errors may have occurred. Rounding removes information, and there is simply no way to create information from lack of information.

Therefore, this problem is only solvable if you have additional information about start, stop, and step.

Suppose step is exact, but start and stop have some accumulated errors bounded by e0 and e1. That is, you know start is at most e0 away from its ideal mathematical value (in either direction), and stop is at most e1 away from its ideal value (in either direction). Then the ideal value of (stop-start)/step could range from (stop-start-e0-e1)/step to (stop-start+e0+e1)/step away from its ideal value.

Suppose there is an integer between (stop-start-e0-e1)/step to (stop-start+e0+e1)/step. Then it is impossible to know whether the ideal ceil result should be the lesser integer or the greater just from the floating-point values of start, stop, and step and the bounds e0 and e1.

However, from the examples you have given, the ideal (stop-start)/step could be exactly an integer, as in (.4-.1)/.1. If so, any non-zero error bounds could result in the error interval straddling an integer, making the problem impossible to solve from the information we have so far.

Therefore, in order to solve the problem, you must have more information than just simple bounds on the errors. You must know, for example, that (stop-start)/step is exactly an integer or is otherwise quantized. For example, if you knew that the ideal calculation of the number of steps would produce a multiple of .1, such as 3.8, 3.9, 4.0, 4.1, or 4.2, but never 4.05, and the errors were sufficiently small that the floating-point calculation (stop-start)/step had a final error less than .05, then it would be possible to round (stop-start)/step to the nearest qualifying multiple and then to apply ceil to that.

If you have such information, you can update the question with what you know about the errors in start, stop, and step (e.g., perhaps each of them is the result of a single conversion from decimal to floating-point) and the possible values of the ideal (stop-start)/step. If you do not have such information, there is no solution.

102

answered Nov 03 '22 13:11

Eric Postpischil

Related questions
                            
                                scikit learn: custom classifier compatible with GridSearchCV
                            
                                How can I overload operators so that type on the left/right does not matter?
                            
                                Sum of distances from a point to all other points
                            
                                OSRM giving wrong response for distance between 2 points
                            
                                Socket Java client - Python Server
                            
                                Understanding Scipy Convolution
                            
                                How to tell if a python module is intended to be python 2 or python 3?
                            
                                PyTorch: training with GPU gives worse error than training the same thing with CPU
                            
                                PySpark Numeric Window Group By
                            
                                Pandas Datetime Interval Resample to Seconds
                            
                                Matching strings in a column of a data frame with the strings in a column of another data frame using R or Python
                            
                                Django : Maintaining option selected in HTML template
                            
                                Text Extraction from image after detecting text region with contours
                            
                                What is event_loop_policy and why is it needed in python asyncio?
                            
                                1d CNN audio in keras
                            
                                Keras MSE definition
                            
                                How do I obtain the second highest value in a row?
                            
                                AttributeError: 'str' object has no attribute 'ndim' [closed]
                            
                                Is there a copy constructor for Map Fields in Python Protocol Buffers?
                            
                                How to convert NumPy array image to TensorFlow image?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to prevent float imprecision from affecting numpy.arange?

Tags:

python

floating-point

numpy

python-2.7