I'm trying to create a series of dummy variables from a categorical variable using pandas in python. I've come across the get_dummies
function, but whenever I try to call it I receive an error that the name is not defined.
Any thoughts or other ways to create the dummy variables would be appreciated.
EDIT: Since others seem to be coming across this, the get_dummies
function in pandas now works perfectly fine. This means the following should work:
import pandas as pd dummies = pd.get_dummies(df['Category'])
See http://blog.yhathq.com/posts/logistic-regression-and-python.html for further information.
When I think of dummy variables I think of using them in the context of OLS regression, and I would do something like this:
import numpy as np import pandas as pd import statsmodels.api as sm my_data = np.array([[5, 'a', 1], [3, 'b', 3], [1, 'b', 2], [3, 'a', 1], [4, 'b', 2], [7, 'c', 1], [7, 'c', 1]]) df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x']) just_dummies = pd.get_dummies(df['dummy']) step_1 = pd.concat([df, just_dummies], axis=1) step_1.drop(['dummy', 'c'], inplace=True, axis=1) # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously) # and we want to get rid of one dummy variable to avoid the dummy variable trap # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b" # relative to "c" step_1 = step_1.applymap(np.int) result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit() print result.summary()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With