I have a variable that is a factor :
$ year : Factor w/ 8 levels "2003","2004",..: 4 6 4 2 4 1 3 3 7 2 ...
I would like to create 8 dummy variables, named "2003", "2004" etc that take the value 0 or 1 depending on the value that the variable "year" takes. The nearest I could come up with is
dt1 <- cbind (dt1, model.matrix(~dt1$year - 1) )
But this has the unfortunate consequences of
model.matrix
(so the above command fails due to different lengths when NA is present in the year
variable).Of course I can get around these problems with more code, but I like my code to be as concise as possible (within reason) so if anyone can suggest better ways to make the dummy variables I would be obliged.
A dummy variable is a numerical variable used in regression analysis to represent subgroups of the sample in your study. In research design, a dummy variable is often used to distinguish different treatment groups.
Dummy variables are independent variables which take the value of either 0 or 1. Just as a "dummy" is a stand-in for a real person, in quantitative analysis, a dummy variable is a numeric stand-in for a qualitative fact or a logical proposition.
This is as concise as I could get. The na.action
option takes care of the NA
values (I would rather do this with an argument than with a global options setting, but I can't see how). The naming of columns is pretty deeply hard-coded, don't see any way to override it within model.matrix
...
options(na.action=na.pass)
dt1 <- data.frame(year=factor(c(NA,2003:2005)))
dt2 <- setNames(cbind(dt1,model.matrix(~year-1,data=dt1)),
c("year",levels(dt1$year)))
As pointed out above, you may run into trouble in some contexts with column names that are not legal R variable names.
year 2003 2004 2005
1 <NA> NA NA NA
2 2003 1 0 0
3 2004 0 1 0
4 2005 0 0 1
You could use ifelse()
which won't omit na
rows (but I guess you might not count it as being "as concise as possible"):
dt1 <- data.frame(year=factor(rep(2003:2010, 10))) # example data
dt1 <- within(dt1, yr2003<-ifelse(year=="2003", 1, 0))
dt1 <- within(dt1, yr2004<-ifelse(year=="2004", 1, 0))
dt1 <- within(dt1, yr2005<-ifelse(year=="2005", 1, 0))
# ...
head(dt1)
# year yr2003 yr2004 yr2005
# 1 2003 1 0 0
# 2 2004 0 1 0
# 3 2005 0 0 1
# 4 2006 0 0 0
# 5 2007 0 0 0
# 6 2008 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With