Is this an efficient or correct way to divide every cell in each column by the maximum value in that column within a table? Is there a better implementation (if this is correct)? Note: All values >= 0
new_data = [];
for row in np.transpose(data)[1::]: #from 1 till end
for elements in row:
if sum(elements) != 0:
new_data.append(elements/max(row));
else:
new_data.append(0);
new_data = np.transpose(new_data);
Now:
id col1 col2 col3 col4
A 2 1 4 0
B 3 8 2 0
C 2 3 0 0
D 5 5 3 0
E 6 3 3 0
Required:
id col1 col2 col3 col4
A 1/3 1/8 1 0
B 1/2 1 1/2 0
C 1/3 3/8 0 0
D 5/6 5/8 3/4 0
E 1 3/8 3/4 0
How do you handle 0
? Like the last column? It should be nan
in theory. (sum(elements) != 0
, what if it is -2 -1 0 1 2? That should be result in -1 -0.5 0 0.5 1, right?)
In [138]:
A*1./np.max(A, axis=0)
Out[138]:
array([[ 0.33333333, 0.125 , 1. , nan],
[ 0.5 , 1. , 0.5 , nan],
[ 0.33333333, 0.375 , 0. , nan],
[ 0.83333333, 0.625 , 0.75 , nan],
[ 1. , 0.375 , 0.75 , nan]])
We can leave the last column as it is.
In [141]:
np.where(np.max(A, axis=0)==0, A, A*1./np.max(A, axis=0))
Out[141]:
array([[ 0.33333333, 0.125 , 1. , 0. ],
[ 0.5 , 1. , 0.5 , 0. ],
[ 0.33333333, 0.375 , 0. , 0. ],
[ 0.83333333, 0.625 , 0.75 , 0. ],
[ 1. , 0.375 , 0.75 , 0. ]])
The correct way of doing it with a loop is:
for row in A.T:
if max(row)>0:
new_data.append([item*1./max(row) for item in row])
else:
new_data.append(row)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With