I read a csv file into a pandas dataframe, and would like to convert the columns with binary answers from strings of yes/no to integers of 1/0. Below, I show one of such columns ("sampleDF" is the pandas dataframe).
In [13]: sampleDF.housing[0:10] Out[13]: 0 no 1 no 2 yes 3 no 4 no 5 no 6 no 7 no 8 yes 9 yes Name: housing, dtype: object
Help is much appreciated!
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
You can use the rename() method of pandas. DataFrame to change column/index name individually. Specify the original name and the new name in dict like {original name: new name} to columns / index parameter of rename() . columns is for the column name, and index is for the index name.
method 1
sample.housing.eq('yes').mul(1)
method 2
pd.Series(np.where(sample.housing.values == 'yes', 1, 0), sample.index)
method 3
sample.housing.map(dict(yes=1, no=0))
method 4
pd.Series(map(lambda x: dict(yes=1, no=0)[x], sample.housing.values.tolist()), sample.index)
method 5
pd.Series(np.searchsorted(['no', 'yes'], sample.housing.values), sample.index)
All yield
0 0 1 0 2 1 3 0 4 0 5 0 6 0 7 0 8 1 9 1
timing
given sample
timing
long samplesample = pd.DataFrame(dict(housing=np.random.choice(('yes', 'no'), size=100000)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With