I am working on building a transition matrix for implementing the PageRank algorithm. How could I use numpy to make sure that the columns add up to one.
For example:
1 1 1
1 1 1
1 1 1
should be normalized to be
.33 .33 .33
.33 .33 .33
.33 .33 .33
Use numpy. ndarray.sum(axis=0) to calculate the sum of each column in numpy. ndarray . Setting axis to 1 would calculate the sum of each row.
append() to append a column to a NumPy array. Call numpy. append(arr, values, axis=1) with a column vector as values to append the column to the array arr .
STEP 1: Declare and initialize an array. STEP 2: The variable sum will be used to calculate the sum of the elements. Initialize it to 0. STEP 3: Loop through the array and add each element of the array to the variable sum as sum = sum + arr[i].
Divide the elements of each column by their column-summations -
a/a.sum(axis=0,keepdims=1) # or simply : a/a.sum(0)
For making the row-summations unity, change the axis input -
a/a.sum(axis=1,keepdims=1)
Sample run -
In [78]: a = np.random.rand(4,5)
In [79]: a
Out[79]:
array([[ 0.37, 0.74, 0.36, 0.41, 0.44],
[ 0.51, 0.86, 0.91, 0.03, 0.76],
[ 0.56, 0.46, 0.01, 0.86, 0.38],
[ 0.72, 0.66, 0.56, 0.84, 0.69]])
In [80]: b = a/a.sum(axis=0,keepdims=1)
In [81]: b.sum(0) # Verify
Out[81]: array([ 1., 1., 1., 1., 1.])
To make sure it works on int
arrays as well for Python 2.x, use from __future__ import division
or use np.true_divide
.
For columns adding upto 0
For columns that add upto 0
, assuming that we are okay with keeping them as they are, we can set the summations to 1
, rather than divide by 0
, like so -
sums = a.sum(axis=0,keepdims=1);
sums[sums==0] = 1
out = a/sums
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With