Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert foreach in Stata to R?

Tags:

r

stata

I have a data frame (df) with variables such as CA, VT, NC, AZ, CAvalue, VTvalue, NCvalue, AZvalue etc.

In Stata, I can use the foreach command and generate new variables:

foreach x in CA VT NC AZ {
    gen `x'1 = 0
    replace `x'1 = 1 if `x'value > 1
}

When I convert this code to R , I found it problematic.

Here's what I wrote:

x=c("CA","VT","NC","AZ")
x_1=paste(x,"1",sep="")
m1=as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df)))
colnames(m1)=x_1

While I have no problem in creating the new variables ending with "1", I don't know how to convert the line starting with "replace". I tried to create another vector with CAtime, VTtime, NCtime, and AZtime. But I don't know how to incorporate them into the loop without writing it four times.

UPDATE: Originally, my data looks something like this:

df=as.data.frame(matrix(runif(200,1,150),ncol=8,nrow=25))
name=c("CA","VT","NC","AZ","CAtime","VTtime", "NCtime","AZtime")
colnames(df)=name

Then I want to create 4 new variables CA1, VT1, NC1, AZ1,in a new data frame m1:

x=c("CA","VT","NC","AZ")
x_1=paste(x,"1",sep="")
m1=as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df)))
colnames(m1)=x_1

All the values of variables in m1=0.

Then, if CAtime>1, I want the corresponding cell in CA1=1. That applies for all the four variables CAtime, VTtime, NCtime, AZtime. I don't want to write four loops and that's why I am stuck.

like image 876
SXS Avatar asked Feb 10 '15 04:02

SXS


2 Answers

Take an example dataset df, matching your description:

set.seed(1)
x <- c("CA","VT","NC","AZ")
df <- setNames(data.frame(replicate(8,sample(0:2,5,replace=TRUE),simplify=FALSE)),
      c("CA","VT","NC","AZ","CAvalue","VTvalue","NCvalue","AZvalue"))
df

#  CA VT NC AZ CAvalue VTvalue NCvalue AZvalue
#1  0  2  0  1       2       1       1       2
#2  1  2  0  2       0       0       1       2
#3  1  1  2  2       1       1       1       0
#4  2  1  1  1       0       2       0       2
#5  0  0  2  2       0       1       2       1

Now lapply a check if value > 1 across each of the columns, and reassign this to new variables with a 1 appended to the end:

df[paste0(x,"1")] <- lapply(df[paste0(x,"value")], function(n) as.numeric(n > 1) )
df

#  CA VT NC AZ CAvalue VTvalue NCvalue AZvalue CA1 VT1 NC1 AZ1
#1  0  2  0  1       2       1       1       2   1   0   0   1
#2  1  2  0  2       0       0       1       2   0   0   0   1
#3  1  1  2  2       1       1       1       0   0   0   0   0
#4  2  1  1  1       0       2       0       2   0   1   0   1
#5  0  0  2  2       0       1       2       1   0   0   1   0
like image 104
thelatemail Avatar answered Oct 24 '22 07:10

thelatemail


Here is a possible option using set from data.table, which would be efficient as this updates by reference.

library(data.table)
setDT(df)[,(x1):= NA]
x2 <- paste0(x, 'value')
indx <- match(x1, names(df))
for(j in seq_along(x2)){
   set(df, i=NULL, j=indx[j], value=as.numeric(df[[x2[j]]]>1))
 }
df
#   CA VT NC AZ CAvalue VTvalue NCvalue AZvalue CA1 VT1 NC1 AZ1
#1:  0  2  0  1       2       1       1       2   1   0   0   1
#2:  1  2  0  2       0       0       1       2   0   0   0   1
#3:  1  1  2  2       1       1       1       0   0   0   0   0
#4:  2  1  1  1       0       2       0       2   0   1   0   1
#5:  0  0  2  2       0       1       2       1   0   0   1   0

Update

Suppose if we need the new columns in another dataset, we could subset the results to form one. Or using a modified example,

 setDT(df1)
 setDT(df2)
 x2 <- paste0(x, 'time')
 for(j in seq_along(x2)){
   set(df2, i=NULL, j=j, value=as.numeric(df1[[x2[j]]] >1))
  }

  head(df2,4)
  #  CA1 VT1 NC1 AZ1
  #1:   0   0   1   1
  #2:   0   1   1   0
  #3:   0   0   0   1
  #4:   1   1   0   0

data

set.seed(1)
x <- c("CA","VT","NC","AZ")
x1 <- paste0(x, 1)

df <- setNames(data.frame(replicate(8,sample(0:2,5,replace=TRUE),
   simplify=FALSE)),c("CA","VT","NC","AZ","CAvalue","VTvalue","NCvalue",
"AZvalue"))

set.seed(425)
df1 <- as.data.frame(matrix(rnorm(200,1,150),ncol=8,nrow=25))
name <- c("CA","VT","NC","AZ","CAtime","VTtime", "NCtime","AZtime")
colnames(df1) <- name

df2 <- as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df1)))
colnames(df2) <- x1
like image 43
akrun Avatar answered Oct 24 '22 06:10

akrun