Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame convert to binary

Tags:

python

pandas

Given pd.DataFrame with 0.0 < values < 1.0, I would like to convert it to binary values 0 /1 according to defined threshold eps = 0.5,

      0     1     2
0  0.35  0.20  0.81
1  0.41  0.75  0.59
2  0.62  0.40  0.94
3  0.17  0.51  0.29

Right now, I only have this for loop which takes quite long time for large dataset:

import numpy as np
import pandas as pd

data = np.array([[.35, .2, .81],[.41, .75, .59],
                [.62, .4, .94], [.17, .51, .29]])

df = pd.DataFrame(data, index=range(data.shape[0]), columns=range(data.shape[1]))
eps = .5
b = np.zeros((df.shape[0], df.shape[1]))
for i in range(df.shape[0]):
    for j in range(df.shape[1]):
        if df.loc[i,j] < eps:
            b[i,j] = 0
        else:
            b[i,j] = 1
df_bin = pd.DataFrame(b, columns=df.columns, index=df.index)

Does anybody know a more effective way to convert to binary values?

     0    1    2
0  0.0  0.0  1.0
1  0.0  1.0  1.0
2  1.0  0.0  1.0
3  0.0  1.0  0.0

Thanks,

like image 493
Färid Alijani Avatar asked Dec 01 '22 09:12

Färid Alijani


1 Answers

df.round

>>> df.round()

np.round

>>> np.round(df)

astype

>>> df.ge(0.5).astype(int)

All which yield

     0    1    2
0  0.0  0.0  1.0
1  0.0  1.0  1.0
2  1.0  0.0  1.0
3  0.0  1.0  0.0

Note: round works here because it automatically sets the threshold for .5 between two integers. For custom thresholds, use the 3rd solution

like image 115
rafaelc Avatar answered Dec 04 '22 01:12

rafaelc