Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating dummy variables in pandas for python

Tags:

python

pandas

I'm trying to create a series of dummy variables from a categorical variable using pandas in python. I've come across the get_dummies function, but whenever I try to call it I receive an error that the name is not defined.

Any thoughts or other ways to create the dummy variables would be appreciated.

EDIT: Since others seem to be coming across this, the get_dummies function in pandas now works perfectly fine. This means the following should work:

import pandas as pd  dummies = pd.get_dummies(df['Category']) 

See http://blog.yhathq.com/posts/logistic-regression-and-python.html for further information.

like image 877
user1074057 Avatar asked Jul 20 '12 22:07

user1074057


1 Answers

When I think of dummy variables I think of using them in the context of OLS regression, and I would do something like this:

import numpy as np import pandas as pd import statsmodels.api as sm  my_data = np.array([[5, 'a', 1],                     [3, 'b', 3],                     [1, 'b', 2],                     [3, 'a', 1],                     [4, 'b', 2],                     [7, 'c', 1],                     [7, 'c', 1]])                   df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x']) just_dummies = pd.get_dummies(df['dummy'])  step_1 = pd.concat([df, just_dummies], axis=1)       step_1.drop(['dummy', 'c'], inplace=True, axis=1) # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously) # and we want to get rid of one dummy variable to avoid the dummy variable trap # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b" # relative to "c" step_1 = step_1.applymap(np.int)   result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit() print result.summary() 
like image 196
Akavall Avatar answered Sep 29 '22 08:09

Akavall