I have a several data sets with 75,000 observations and a type
variable that can take on a value 0-4. I want to add five new dummy variables to each data set for all types. The best way I could come up with to do this is as follows:
# For the 'binom' data set create dummy variables for all types in all data sets binom.dummy.list<-list() for(i in 0:4){ binom.dummy.list[[i+1]]<-sapply(binom$type,function(t) ifelse(t==i,1,0)) } # Add and merge data binom.dummy.df<-as.data.frame(do.call("cbind",binom.dummy.list)) binom.dummy.df<-transform(binom.dummy.df,id=1:nrow(binom)) binom<-merge(binom,binom.dummy.df,by="id")
While this works, it is incredibly slow (the merge function has even crashed a few times). Is there a more efficient way to do this? Perhaps this functionality is part of a package that I am not familiar with?
To convert your categorical variables to dummy variables in Python you c an use Pandas get_dummies() method. For example, if you have the categorical variable “Gender” in your dataframe called “df” you can use the following code to make dummy variables: df_dc = pd. get_dummies(df, columns=['Gender']) .
To convert category variables to dummy variables in tidyverse, use the spread() method. To do so, use the spread() function with three arguments: key, which is the column to convert into categorical values, in this case, “Reporting Airline”; value, which is the value you want to set the key to (in this case “dummy”);
R has a "sub-language" to translate formulas into design matrix, and in the spirit of the language you can take advantage of it. It's fast and concise. Example: you have a cardinal predictor x, a categorical predictor catVar, and a response y.
> binom <- data.frame(y=runif(1e5), x=runif(1e5), catVar=as.factor(sample(0:4,1e5,TRUE))) > head(binom) y x catVar 1 0.5051653 0.34888390 2 2 0.4868774 0.85005067 2 3 0.3324482 0.58467798 2 4 0.2966733 0.05510749 3 5 0.5695851 0.96237936 1 6 0.8358417 0.06367418 2
You just do
> A <- model.matrix(y ~ x + catVar,binom) > head(A) (Intercept) x catVar1 catVar2 catVar3 catVar4 1 1 0.34888390 0 1 0 0 2 1 0.85005067 0 1 0 0 3 1 0.58467798 0 1 0 0 4 1 0.05510749 0 0 1 0 5 1 0.96237936 1 0 0 0 6 1 0.06367418 0 1 0 0
Done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With