I have the following DataFrame:
df = pd.DataFrame(['Male','Female', 'Female', 'Unknown', 'Male'], columns = ['Gender'])
I want to convert this to a DataFrame with columns 'Male','Female' and 'Unknown' the values 0 and 1 indicated the Gender.
Gender Male Female
Male 1 0
Female 0 1
.
.
.
.
To do this, I wrote a function and called the function using map.
def isValue(x , value):
if(x == value):
return 1
else:
return 0
for value in df['Gender'].unique():
df[str(value)] = df['Gender'].map( lambda x: isValue(str(x) , str(value)))
Which works perfectly. But is there a better way to do this? Is there an inbuilt function in any of sklearn package that I can use?
The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.
In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
Similarly to loc, at provides label based scalar lookups. You can also set using these indexers.
Select Cell Value from DataFrame Using df['col_name']. values[] We can use df['col_name']. values[] to get 1×1 DataFrame as a NumPy array, then access the first and only value of that array to get a cell value, for instance, df["Duration"].
Yes, there is a better way to do this. It's called pd.get_dummies
pd.get_dummies(df)
To replicate what you have:
order = ['Gender', 'Male', 'Female', 'Unknown']
pd.concat([df, pd.get_dummies(df, '', '').astype(int)], axis=1)[order]
My preference is pd.get_dummies()
. Yes, there is sklearn method.
From Docs:
>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])
OneHotEncoder(categorical_features='all', dtype=<... 'float'>,
handle_unknown='error', n_values='auto', sparse=True)
>>> enc.n_values_
array([2, 3, 4])
>>> enc.feature_indices_
array([0, 2, 5, 9])
>>> enc.transform([[0, 1, 1]]).toarray()
array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.]])
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With