drop_First=true during dummy variable creation in pandas

Question

I have months(Jan, Feb, Mar etc) data in my dataset and I am generating dummy variable using pandas library. pd.get_dummies(df['month'],drop_first=True)

I want to understand whether I should use drop_first=True or not in this case? Why is it important to use drop_first and for which type of variables?

Soumya · Accepted Answer

drop_first=True is important to use, as it helps in reducing the extra column created during dummy variable creation. Hence it reduces the correlations created among dummy variables.
Let’s say we have 3 types of values in Categorical column and we want to create dummy variable for that column. If one variable is not furnished and semi_furnished, then It is obvious unfurnished. So we do not need 3rd variable to identify the unfurnished. Example

Hence if we have categorical variable with n-levels, then we need to use n-1 columns to represent the dummy variables.

Taeef Najib · Answer

What is drop_first=True

drop_first=True drops the first column during dummy variable creation. Suppose, you have a column for gender that contains 4 variables- "Male", "Female", "Other", "Unknown". So a person is either "Male", or "Female", or "Other". If they are not either of these 3, their gender is "Unknown".

We do NOT need another column for "Uknown".

It can be necessary for some situations, while not applicable for others. The goal is to reduce the number of columns by dropping the column that is not necessary. However, it is not always true. For some situations, we need to keep the first column.

Example

Suppose, we have 5 unique values in a column called "Fav_genre"- "Rock", "Hip hop", "Pop", "Metal", "Country" This column contains value While dummy variable creation, we usually generate 5 columns. In this case, drop_first=True is not applicable. A person may have more than one favorite genres. So dropping any of the columns would not be right. Hence, drop_first=False is the default parameter.

drop_First=true during dummy variable creation in pandas

Tags:

python

linear-regression

Snehal Gupta

2 Answers

Soumya

Taeef Najib

Recent Activity

Donate For Us

drop_First=true during dummy variable creation in pandas

Tags:

python

linear-regression

Snehal Gupta

2 Answers

Soumya

Taeef Najib

Related questions

Recent Activity

Donate For Us