Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Get Dummies

I have the following dataframe:

   amount  catcode    cid      cycle      date     di  feccandid    type
0   1000    E1600   N00029285   2014    2014-05-15  D   H8TX22107   24K
1   5000    G4600   N00026722   2014    2013-10-22  D   H4TX28046   24K
2      4    C2100   N00030676   2014    2014-03-26  D   H0MO07113   24Z

I want to make dummy variables for the values in column type. There about 15. I have tried this:

pd.get_dummies(df['type'])

And it returns this:

           24A  24C  24E  24F  24K  24N  24P  24R  24Z
date                                    
2014-05-15  0    0    0    0    1    0    0    0    0
2013-10-22  0    0    0    0    1    0    0    0    0
2014-03-26  0    0    0    0    0    0    0    0    1

What I would like is to have a dummy variable column for each unique value in Type

like image 940
Collective Action Avatar asked Mar 29 '16 13:03

Collective Action


People also ask

What does pandas get dummies do?

get_dummies() is used for data manipulation. It converts categorical data into dummy or indicator variables.

How do pandas get dummies?

Use Get dummies on a Dataframe column. Use Get dummies on a Dataframe column, and drop the first category. Use Get dummies on a Dataframe column, and specify a prefix for the dummy variables. Use Get dummies on a Dataframe column, and include NA values.

How do you convert pandas to columns for dummies?

To convert your categorical variables to dummy variables in Python you c an use Pandas get_dummies() method. For example, if you have the categorical variable “Gender” in your dataframe called “df” you can use the following code to make dummy variables: df_dc = pd. get_dummies(df, columns=['Gender']) .

Why do we use Drop_first true?

drop_first=True is important to use, as it helps in reducing the extra column created during dummy variable creation. Hence it reduces the correlations created among dummy variables.


2 Answers

You can try :

df = pd.get_dummies(df, columns=['type']) 
like image 154
Till Avatar answered Sep 24 '22 18:09

Till


Consider I have the following dataframe:

   Survived  Pclass     Sex   Age     Fare
0         0       3    male  22.0   7.2500
1         1       1  female  38.0  71.2833
2         1       3  female  26.0   7.9250
3         1       1  female  35.0  53.1000
4         0       3    male  35.0   8.0500

There are two ways to implement get_dummies:

Method 1:

one_hot = pd.get_dummies(dataset, columns = ['Sex'])

This will return:

   Survived  Pclass  Age     Fare  Sex_female  Sex_male
0         0       3   22   7.2500           0         1
1         1       1   38  71.2833           1         0
2         1       3   26   7.9250           1         0
3         1       1   35  53.1000           1         0
4         0       3   35   8.0500           0         1

Method 2:

one_hot = pd.get_dummies(dataset['Sex'])

This will return:

   female  male
0       0     1
1       1     0
2       1     0
3       1     0
4       0     1
like image 27
user41855 Avatar answered Sep 25 '22 18:09

user41855