I have a data frame (df) with variables such as CA, VT, NC, AZ, CAvalue, VTvalue, NCvalue, AZvalue etc.
In Stata, I can use the foreach
command and generate
new variables:
foreach x in CA VT NC AZ {
gen `x'1 = 0
replace `x'1 = 1 if `x'value > 1
}
When I convert this code to R , I found it problematic.
Here's what I wrote:
x=c("CA","VT","NC","AZ")
x_1=paste(x,"1",sep="")
m1=as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df)))
colnames(m1)=x_1
While I have no problem in creating the new variables ending with "1", I don't know how to convert the line starting with "replace". I tried to create another vector with CAtime, VTtime, NCtime, and AZtime. But I don't know how to incorporate them into the loop without writing it four times.
UPDATE: Originally, my data looks something like this:
df=as.data.frame(matrix(runif(200,1,150),ncol=8,nrow=25))
name=c("CA","VT","NC","AZ","CAtime","VTtime", "NCtime","AZtime")
colnames(df)=name
Then I want to create 4 new variables CA1, VT1, NC1, AZ1,in a new data frame m1:
x=c("CA","VT","NC","AZ")
x_1=paste(x,"1",sep="")
m1=as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df)))
colnames(m1)=x_1
All the values of variables in m1=0.
Then, if CAtime>1, I want the corresponding cell in CA1=1. That applies for all the four variables CAtime, VTtime, NCtime, AZtime. I don't want to write four loops and that's why I am stuck.
Take an example dataset df
, matching your description:
set.seed(1)
x <- c("CA","VT","NC","AZ")
df <- setNames(data.frame(replicate(8,sample(0:2,5,replace=TRUE),simplify=FALSE)),
c("CA","VT","NC","AZ","CAvalue","VTvalue","NCvalue","AZvalue"))
df
# CA VT NC AZ CAvalue VTvalue NCvalue AZvalue
#1 0 2 0 1 2 1 1 2
#2 1 2 0 2 0 0 1 2
#3 1 1 2 2 1 1 1 0
#4 2 1 1 1 0 2 0 2
#5 0 0 2 2 0 1 2 1
Now lapply
a check if value > 1
across each of the columns, and reassign this to new variables with a 1
appended to the end:
df[paste0(x,"1")] <- lapply(df[paste0(x,"value")], function(n) as.numeric(n > 1) )
df
# CA VT NC AZ CAvalue VTvalue NCvalue AZvalue CA1 VT1 NC1 AZ1
#1 0 2 0 1 2 1 1 2 1 0 0 1
#2 1 2 0 2 0 0 1 2 0 0 0 1
#3 1 1 2 2 1 1 1 0 0 0 0 0
#4 2 1 1 1 0 2 0 2 0 1 0 1
#5 0 0 2 2 0 1 2 1 0 0 1 0
Here is a possible option using set
from data.table
, which would be efficient as this updates by reference.
library(data.table)
setDT(df)[,(x1):= NA]
x2 <- paste0(x, 'value')
indx <- match(x1, names(df))
for(j in seq_along(x2)){
set(df, i=NULL, j=indx[j], value=as.numeric(df[[x2[j]]]>1))
}
df
# CA VT NC AZ CAvalue VTvalue NCvalue AZvalue CA1 VT1 NC1 AZ1
#1: 0 2 0 1 2 1 1 2 1 0 0 1
#2: 1 2 0 2 0 0 1 2 0 0 0 1
#3: 1 1 2 2 1 1 1 0 0 0 0 0
#4: 2 1 1 1 0 2 0 2 0 1 0 1
#5: 0 0 2 2 0 1 2 1 0 0 1 0
Suppose if we need the new columns in another dataset, we could subset the results to form one. Or using a modified example,
setDT(df1)
setDT(df2)
x2 <- paste0(x, 'time')
for(j in seq_along(x2)){
set(df2, i=NULL, j=j, value=as.numeric(df1[[x2[j]]] >1))
}
head(df2,4)
# CA1 VT1 NC1 AZ1
#1: 0 0 1 1
#2: 0 1 1 0
#3: 0 0 0 1
#4: 1 1 0 0
set.seed(1)
x <- c("CA","VT","NC","AZ")
x1 <- paste0(x, 1)
df <- setNames(data.frame(replicate(8,sample(0:2,5,replace=TRUE),
simplify=FALSE)),c("CA","VT","NC","AZ","CAvalue","VTvalue","NCvalue",
"AZvalue"))
set.seed(425)
df1 <- as.data.frame(matrix(rnorm(200,1,150),ncol=8,nrow=25))
name <- c("CA","VT","NC","AZ","CAtime","VTtime", "NCtime","AZtime")
colnames(df1) <- name
df2 <- as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df1)))
colnames(df2) <- x1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With