Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate a dummy-variable

Tags:

r

r-faq

I have had trouble generating the following dummy-variables in R:

I'm analyzing yearly time series data (time period 1948-2009). I have two questions:

  1. How do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)?

  2. How do I generate a dummy variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?

like image 224
Pantera Avatar asked Aug 02 '12 23:08

Pantera


People also ask

How do you create a dummy variable?

There are two steps to successfully set up dummy variables in a multiple regression: (1) create dummy variables that represent the categories of your categorical independent variable; and (2) enter values into these dummy variables – known as dummy coding – to represent the categories of the categorical independent ...

What is a dummy variable in Stata?

Stata: Data Analysis and Statistical Software A dummy variable is a variable that takes on the values 1 and 0; 1 means something is true (such as age < 25, sex is male, or in the category “very much”). Dummy variables are also called indicator variables.


2 Answers

Another option that can work better if you have many variables is factor and model.matrix.

year.f = factor(year) dummies = model.matrix(~year.f) 

This will include an intercept column (all ones) and one column for each of the years in your data set except one, which will be the "default" or intercept value.

You can change how the "default" is chosen by messing with contrasts.arg in model.matrix.

Also, if you want to omit the intercept, you can just drop the first column or add +0 to the end of the formula.

Hope this is useful.

like image 163
David J. Harris Avatar answered Sep 28 '22 04:09

David J. Harris


The simplest way to produce these dummy variables is something like the following:

> print(year) [1] 1956 1957 1957 1958 1958 1959 > dummy <- as.numeric(year == 1957) > print(dummy) [1] 0 1 1 0 0 0 > dummy2 <- as.numeric(year >= 1957) > print(dummy2) [1] 0 1 1 1 1 1 

More generally, you can use ifelse to choose between two values depending on a condition. So if instead of a 0-1 dummy variable, for some reason you wanted to use, say, 4 and 7, you could use ifelse(year == 1957, 4, 7).

like image 36
Martin O'Leary Avatar answered Sep 28 '22 04:09

Martin O'Leary