Given pd.DataFrame
with 0.0 < values < 1.0
, I would like to convert it to binary values 0
/1
according to defined threshold eps = 0.5
,
0 1 2
0 0.35 0.20 0.81
1 0.41 0.75 0.59
2 0.62 0.40 0.94
3 0.17 0.51 0.29
Right now, I only have this for loop
which takes quite long time for large dataset:
import numpy as np
import pandas as pd
data = np.array([[.35, .2, .81],[.41, .75, .59],
[.62, .4, .94], [.17, .51, .29]])
df = pd.DataFrame(data, index=range(data.shape[0]), columns=range(data.shape[1]))
eps = .5
b = np.zeros((df.shape[0], df.shape[1]))
for i in range(df.shape[0]):
for j in range(df.shape[1]):
if df.loc[i,j] < eps:
b[i,j] = 0
else:
b[i,j] = 1
df_bin = pd.DataFrame(b, columns=df.columns, index=df.index)
Does anybody know a more effective way to convert to binary values?
0 1 2
0 0.0 0.0 1.0
1 0.0 1.0 1.0
2 1.0 0.0 1.0
3 0.0 1.0 0.0
Thanks,
df.round
>>> df.round()
np.round
>>> np.round(df)
astype
>>> df.ge(0.5).astype(int)
All which yield
0 1 2
0 0.0 0.0 1.0
1 0.0 1.0 1.0
2 1.0 0.0 1.0
3 0.0 1.0 0.0
Note: round
works here because it automatically sets the threshold for .5
between two integers. For custom thresholds, use the 3rd solution
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With