Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a conditional dummy in R?

Tags:

loops

dataframe

r

I have a dataframe of time series data with daily observations of temperatures. I need to create a dummy variable that counts each day that has temperature above a threshold of 5C. This would be easy in itself, but an additional condition exists: counting starts only after ten consecutive days above the threshold occurs. Here's an example dataframe:

df <- data.frame(date = seq(365), 
         temp = -30 + 0.65*seq(365) - 0.0018*seq(365)^2 + rnorm(365))

I think I got it done, but with too many loops for my liking. This is what I did:

df$dummyUnconditional <- 0
df$dummyHead <- 0
df$dummyTail <- 0

for(i in 1:nrow(df)){
    if(df$temp[i] > 5){
        df$dummyUnconditional[i] <- 1
    }
}

for(i in 1:(nrow(df)-9)){
    if(sum(df$dummyUnconditional[i:(i+9)]) == 10){
        df$dummyHead[i] <- 1
    }
}

for(i in 9:nrow(df)){
    if(sum(df$dummyUnconditional[(i-9):i]) == 10){
        df$dummyTail[i] <- 1
    }
}

df$dummyConditional <- ifelse(df$dummyHead == 1 | df$dummyTail == 1, 1, 0)

Could anyone suggest simpler ways for doing this?

like image 484
Antti Avatar asked Feb 01 '16 14:02

Antti


People also ask

Does R automatically create dummy variables?

This recoding is called “dummy coding” and leads to the creation of a table called contrast matrix. This is done automatically by statistical software, such as R. Here, you'll learn how to build and interpret a linear regression model with categorical predictor variables.

How do you use dummy variables?

In research design, a dummy variable is often used to distinguish different treatment groups. In the simplest case, we would use a 0,1 dummy variable where a person is given a value of 0 if they are in the control group or a 1 if they are in the treated group.

Can dummy variables be greater than 1?

Yes, coefficients of dummy variables can be more than one or less than zero. Remember that you can interpret that coefficient as the mean change in your response (dependent) variable when the dummy changes from 0 to 1, holding all other variables constant (i.e. ceteris paribus).


1 Answers

Here's a base R option using rle:

df$dummy <- with(rle(df$temp > 5), rep(as.integer(values & lengths >= 10), lengths))

Some explanation: The task is a classic use case for the run length encoding (rle) function, imo. We first check if the value of temp is greater than 5 (creating a logical vector) and apply rle on that vector resulting in:

> rle(df$temp > 5)
#Run Length Encoding
#  lengths: int [1:7] 66 1 1 225 2 1 69
#  values : logi [1:7] FALSE TRUE FALSE TRUE FALSE TRUE ...

Now we want to find those cases where the values is TRUE (i.e. temp is greater than 5) and where at the same time the lengths is greater than 10 (i.e. at least ten consecutive tempvalues are greater than 5). We do this by running:

values & lengths >= 10

And finally, since we want to return a vector of the same lengths as nrow(df), we use rep(..., lengths) and as.integer in order to return 1/0 instead of TRUE/FALSE.

like image 187
talat Avatar answered Sep 28 '22 02:09

talat