Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vector sum of multidimensional arrays in numpy

If I have a an N^3 array of triplets in a numpy array, how do I do a vector sum on all of the triplets in the array? For some reason I just can't wrap my brain around the summation indices. Here is what I tried, but it doesn't seem to work:

a = np.random.random((5,5,5,3)) - 0.5
s = a.sum((0,1,2))
np.linalg.norm(s)

I would expect that as N gets large, if the sum is working correctly I should converge to 0, but I just keep getting bigger. The sum gives me a vector that is the correct shape (3x1), but obviously I must be doing something wrong. I know this should be easy, but I'm just not getting it.

Thanks in advance!

like image 779
chadh Avatar asked Nov 15 '13 15:11

chadh


2 Answers

Is is easier to understand you problem analytically if instead of uniform random numbers we use standard normal numbers, and the qualitative results can be applied to your particular case:

>>> a = np.random.normal(0, 1, size=(5, 5, 5, 3))
>>> s = a.sum(axis=(0, 1, 2))

So now each of the three items of s is the sum of 125 numbers, each drawn from a standard normal distribution. It is a well established fact that adding up two normal distributions gives you another normal distribution with mean the sum of the means, and variance the sum of the variances. So each of the three values in s will be distributed as a random sample from a normal distribution with mean 0 and standard deviation sqrt(125) = 11.18.

The fact that the variance of the distribution grows means that, even though if you run your code many times, you will see an average value of 0 for each of those numbers, on any given run you are more likely to see larger offsets from 0.

Furthermore you then go and compute the norm of those three values. Squaring three standard normal distributions and adding them together gives you a chi-squared distribution. If you then take the square root, you get a chi distribution. The former is easier to deal with, and it predicts that the average value of the square of the norm of your three values will be 3 * 125. And it most certainly seems to be:

>>> mean_norm_sq = 0
>>> for n in xrange(1000):
...     a = np.random.normal(0, 1, size=(5, 5, 5, 3))
...     s = a.sum(axis=(0, 1, 2))
...     mean_norm_sq += np.sum(s**2)
... 
>>> mean_norm_sq / 1000
374.47629802482447
like image 76
Jaime Avatar answered Oct 07 '22 09:10

Jaime


As the comments note, there is no reason why the squared sum should approach zero. By the description, an array of N three-dimensional vectors sounds like it should have the shape of (N,3) not (N,N,N,3), but I may be misunderstanding it. Either way, it is simple to observe what happens in the two cases:

import numpy as np

avg_sum = []
sq_sum  = []

N_val = 2**np.arange(15)
for N in N_val:
    A = np.random.random((N,3)) - 0.5
    avg_sum.append( A.sum(axis=1).mean() )
    sq_sum.append ( (A**2).sum(axis=1).mean() )

import pylab as plt
plt.plot(N_val, avg_sum, label="Average sum")
plt.plot(N_val, sq_sum,  label="Squared sum")
plt.legend(loc="best")
plt.show()

enter image description here

The average sum goes to zero as your intuition expects.

like image 43
Hooked Avatar answered Oct 07 '22 11:10

Hooked