I am working on building a transition matrix for implementing the PageRank algorithm. How could I use numpy to make sure that the columns add up to one.
For example:
1 1 1
1 1 1
1 1 1
should be normalized to be
.33 .33 .33
.33 .33 .33
.33 .33 .33
Use numpy. ndarray.sum(axis=0) to calculate the sum of each column in numpy. ndarray . Setting axis to 1 would calculate the sum of each row.
append() to append a column to a NumPy array. Call numpy. append(arr, values, axis=1) with a column vector as values to append the column to the array arr .
STEP 1: Declare and initialize an array. STEP 2: The variable sum will be used to calculate the sum of the elements. Initialize it to 0. STEP 3: Loop through the array and add each element of the array to the variable sum as sum = sum + arr[i].
Divide the elements of each column by their column-summations -
a/a.sum(axis=0,keepdims=1) # or simply : a/a.sum(0)
For making the row-summations unity, change the axis input -
a/a.sum(axis=1,keepdims=1)
Sample run -
In [78]: a = np.random.rand(4,5)
In [79]: a
Out[79]:
array([[ 0.37, 0.74, 0.36, 0.41, 0.44],
[ 0.51, 0.86, 0.91, 0.03, 0.76],
[ 0.56, 0.46, 0.01, 0.86, 0.38],
[ 0.72, 0.66, 0.56, 0.84, 0.69]])
In [80]: b = a/a.sum(axis=0,keepdims=1)
In [81]: b.sum(0) # Verify
Out[81]: array([ 1., 1., 1., 1., 1.])
To make sure it works on int arrays as well for Python 2.x, use from __future__ import division or use np.true_divide.
For columns adding upto 0
For columns that add upto 0, assuming that we are okay with keeping them as they are, we can set the summations to 1, rather than divide by 0, like so -
sums = a.sum(axis=0,keepdims=1);
sums[sums==0] = 1
out = a/sums
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With