I have the following DataFrame: <pre class="prettyprint"><code>df = pd.DataFrame(['Male','Female', 'Female', 'Unknown', 'Male'], columns = ['Gender']) </code></pre> I want to convert this to a DataFrame with columns 'Male','Female' and 'Unknown' the values 0 and 1 indicated the Gender. <pre class="prettyprint"><code>Gender Male Female Male 1 0 Female 0 1 . . . . </code></pre> To do this, I wrote a function and called the function using map. <pre class="prettyprint"><code>def isValue(x , value): if(x == value): return 1 else: return 0 for value in df['Gender'].unique(): df[str(value)] = df['Gender'].map( lambda x: isValue(str(x) , str(value))) </code></pre> Which works perfectly. But is there a better way to do this? Is there an inbuilt function in any of sklearn package that I can use?

Yes, there is a better way to do this. It's called <code>pd.get_dummies</code> <pre class="prettyprint"><code>pd.get_dummies(df) </code></pre> <img src="https://i.stack.imgur.com/kG6CP.png" alt="enter image description here"> To replicate what you have: <pre class="prettyprint"><code>order = ['Gender', 'Male', 'Female', 'Unknown'] pd.concat([df, pd.get_dummies(df, '', '').astype(int)], axis=1)[order] </code></pre> <img src="https://i.stack.imgur.com/Addwl.png" alt="enter image description here">

My preference is <code>pd.get_dummies()</code>. Yes, there is sklearn method. From Docs: <pre class="prettyprint"><code>>>> from sklearn.preprocessing import OneHotEncoder >>> enc = OneHotEncoder() >>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]) OneHotEncoder(categorical_features='all', dtype=<... 'float'>, handle_unknown='error', n_values='auto', sparse=True) >>> enc.n_values_ array([2, 3, 4]) >>> enc.feature_indices_ array([0, 2, 5, 9]) >>> enc.transform([[0, 1, 1]]).toarray() array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.]]) </code></pre> http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

How to binarize the values in a pandas DataFrame?

Tags:

python

pandas

dataframe

scikit-learn

I have the following DataFrame:

Click to copy

df = pd.DataFrame(['Male','Female', 'Female', 'Unknown', 'Male'], columns = ['Gender'])

I want to convert this to a DataFrame with columns 'Male','Female' and 'Unknown' the values 0 and 1 indicated the Gender.

Click to copy

Gender  Male  Female
Male     1      0
Female   0      1
       .
       .
       .
       .

To do this, I wrote a function and called the function using map.

Click to copy

def isValue(x , value):
if(x == value):
    return 1
else: 
    return 0


for value in df['Gender'].unique():
    df[str(value)] = df['Gender'].map( lambda x: isValue(str(x) , str(value)))

Which works perfectly. But is there a better way to do this? Is there an inbuilt function in any of sklearn package that I can use?

895

asked Aug 01 '16 17:08

Rakesh Adhikesavan

2 Answers

Yes, there is a better way to do this. It's called pd.get_dummies

Click to copy

pd.get_dummies(df)

enter image description here

To replicate what you have:

Click to copy

order = ['Gender', 'Male', 'Female', 'Unknown']
pd.concat([df, pd.get_dummies(df, '', '').astype(int)], axis=1)[order]

enter image description here

118

answered Nov 10 '22 21:11

piRSquared

My preference is pd.get_dummies(). Yes, there is sklearn method.

From Docs:

Click to copy

>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])  
OneHotEncoder(categorical_features='all', dtype=<... 'float'>,
       handle_unknown='error', n_values='auto', sparse=True)
>>> enc.n_values_
array([2, 3, 4])
>>> enc.feature_indices_
array([0, 2, 5, 9])
>>> enc.transform([[0, 1, 1]]).toarray()
array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]])

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

answered Nov 10 '22 21:11

Merlin

Related questions
                            
                                TypeError: 'numpy.float64' object is not iterable Keras
                            
                                How to count number of occurrences by using pyspark
                            
                                Python: Convert Dictionary of String Times to Date Times
                            
                                Does finally ensure some code gets run atomically, no matter what?
                            
                                remove known exact row in huge csv
                            
                                cv2.imread does not read jpg files
                            
                                why do i get a bad file descriptor error?
                            
                                Fast fuse of close points in a numpy-2d (vectorized)
                            
                                Python - is there a way to store an operation(+ - * /) in a list or as a variable?
                            
                                Python - Find center of object in an image
                            
                                are elements of an array in a set?
                            
                                How to implement a Global Python Logger?
                            
                                Python/Django date query: Unsupported lookup 'date' for DateField or join on the field not permitted
                            
                                xterm not working in mininet
                            
                                nvcc fatal : Value 'sm_61' is not defined for option 'gpu-architecture' error with theano
                            
                                How to create 2-layers neural network using TensorFlow and python on MNIST data
                            
                                Python's super() , what exactly happens? [duplicate]
                            
                                Python: Generate a geometric progression using list comprehension
                            
                                Reference a dictionary within itself
                            
                                PEP 424 __length_hint__() - Is there a way to do the same for generators or zips?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to binarize the values in a pandas DataFrame?

Tags:

python

pandas

dataframe

scikit-learn

Rakesh Adhikesavan

People also ask

2 Answers

piRSquared

Merlin

Recent Activity

Donate For Us