Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Middle point of each pair of an numpy.array

Tags:

python

numpy

mean

I have an array of the form:

x = np.array([ 1230., 1230., 1227., 1235., 1217., 1153., 1170.])

and I would like to produce another array where the values are the mean of each pair of values within my original array:

xm = np.array([ 1230., 1228.5, 1231., 1226., 1185., 1161.5])

Someone knows the easiest and fast way to do it without using loops?

like image 406
iury simoes-sousa Avatar asked May 25 '14 13:05

iury simoes-sousa


People also ask

How do you find the center of an array in Python?

I assume that a point is a tuple like (x,y), so you can use zip to join the x's and y's. Then using the min and max of x and y's, you can determine the center point. x,y=zip(*points) center=(max(x)+min(x))/2., (max(y)+min(y))/2.

How do you find the median of a NumPy array?

The numpy. median() function in the NumPy library is used to calculate the median value along with the specified axis of single-dimensional as-well as multi-dimensional array. This function returns the median value of the array as an output.


2 Answers

Even shorter, slightly sweeter:

(x[1:] + x[:-1]) / 2 

  • This is faster:

    >>> python -m timeit -s "import numpy; x = numpy.random.random(1000000)" "x[:-1] + numpy.diff(x)/2" 100 loops, best of 3: 6.03 msec per loop  >>> python -m timeit -s "import numpy; x = numpy.random.random(1000000)" "(x[1:] + x[:-1]) / 2" 100 loops, best of 3: 4.07 msec per loop 
  • This is perfectly accurate:

    Consider each element in x[1:] + x[:-1]. So consider x₀ and x₁, the first and second elements.

    x₀ + x₁ is calculated to perfect precision and then rounded, in accordance to IEEE. It would therefore be the correct answer if that was all that was needed.

    (x₀ + x₁) / 2 is just half of that value. This can almost always be done by reducing the exponent by one, except in two cases:

    • x₀ + x₁ overflows. This will result in an infinity (of either sign). That's not what is wanted, so the calculation will be wrong.

    • x₀ + x₁ underflows. As the size is reduced, rounding will be perfect and thus the calculation will be correct.

    In all other cases, the calculation will be correct.


    Now consider x[:-1] + numpy.diff(x) / 2. This, by inspection of the source, evaluates directly to

    x[:-1] + (x[1:] - x[:-1]) / 2 

    and so consider again x₀ and x₁.

    x₁ - x₀ will have severe "problems" with underflow for many values. This will also lose precision with large cancellations. It's not immediately clear that this doesn't matter if the signs are the same, though, as the error effectively cancels out on addition. What does matter is that rounding occurs.

    (x₁ - x₀) / 2 will be no less rounded, but then x₀ + (x₁ - x₀) / 2 involves another rounding. This means that errors will creep in. Proof:

    import numpy  wins = draws = losses = 0  for _ in range(100000):     a = numpy.random.random()     b = numpy.random.random() / 0.146      x = (a+b)/2      y = a + (b-a)/2      error_mine   = (a-x) - (x-b)     error_theirs = (a-y) - (y-b)      if x != y:         if abs(error_mine) < abs(error_theirs):             wins += 1         elif abs(error_mine) == abs(error_theirs):             draws += 1         else:             losses += 1     else:         draws += 1  wins / 1000 #>>> 12.44  draws / 1000 #>>> 87.56  losses / 1000 #>>> 0.0 

    This shows that for the carefully chosen constant of 1.46, a full 12-13% of answers are wrong with the diff variant! As expected, my version is always right.

    Now consider underflow. Although my variant has overflow problems, these are much less big a deal than cancellation problems. It should be obvious why the double-rounding from the above logic is very problematic. Proof:

    ...     a = numpy.random.random()     b = -numpy.random.random() ...  wins / 1000 #>>> 25.149  draws / 1000 #>>> 74.851  losses / 1000 #>>> 0.0 

    Yeah, it gets 25% wrong!

    In fact, it doesn't take much pruning to get this up to 50%:

    ...     a = numpy.random.random()     b = -a + numpy.random.random()/256 ...  wins / 1000 #>>> 49.188  draws / 1000 #>>> 50.812  losses / 1000 #>>> 0.0 

    Well, it's not that bad. It's only ever 1 least-significant-bit off as long as the signs are the same, I think.


So there you have it. My answer is the best unless you're finding the average of two values whose sum exceeds 1.7976931348623157e+308 or is smaller than -1.7976931348623157e+308.

like image 157
Veedrac Avatar answered Oct 13 '22 16:10

Veedrac


Short and sweet:

x[:-1] + np.diff(x)/2 

That is, take each element of x except the last, and add one-half of the difference between it and the subsequent element.

like image 41
John Zwinck Avatar answered Oct 13 '22 14:10

John Zwinck