I think I'm still a little unclear on how R works on individual elements in vectorized statements.
I have the following code
df1$flag <- ifelse(df1$year < 2013 &
df1$year == df2$year &
as.character(df1$code) == as.character(df2$code), 'Y', df1$flag)
And I am operating on this data
year <- c(2011, 2012, 2011, 2013, 2014, 2016, 2016, 2015, 2016, 2010)
flag <- 'N'
code <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
df1 <- data.frame(year, flag, code)
rm(year)
rm(flag)
rm(code)
year <- c(2015, 2013, 2011, 2012, 2016, 2016, 2010)
code <- c(5, 7, 3, 2, 14, 99, 10)
df2 <- data.frame(year, code)
df1$flag <- ifelse(df1$year < 2013 &
df1$year == df2$year &
as.character(df1$code) == as.character(df2$code), 'Y', df1$flag)
I want this to be the output
> df1
year flag code
1 2011 1 1
2 2012 Y 2
3 2011 Y 3
4 2013 1 4
5 2014 1 5
6 2016 1 6
7 2016 1 7
8 2015 1 8
9 2016 1 9
10 2010 Y 10
But instead I am getting this
> df1
year flag code
1 2011 1 1
2 2012 1 2
3 2011 Y 3
4 2013 1 4
5 2014 1 5
6 2016 1 6
7 2016 1 7
8 2015 1 8
9 2016 1 9
10 2010 1 10
I want the ifelse statement to compare each element of df1$year and df1$code to each element of df2$year and df2$code, but it doesn't look like == or %in% will do that.
To put it another way, what I want is to compare the elements like this
for(i in 1:nrow(df1)) {
for(z in 1:nrow(df2)) {
if(df1$year[i] < 2013 & df1$year[i] == df2$year[z] &
as.character(df1$code[i]) == as.character(df2$code[z]))
df1$flag[i] <- 'Y'
}
}
Obviously using for like this GREATLY slows down execution and cannot be used, but it doesn't seem like ==, %in%, identical() or all.equal() will do what I describe in the for loop either. How can I get the output I've described in R?
ifelse compares two vectors element-wise (assuming they are of the same length, if not, the small vector will be recycled so that they are).
This means that your code:
df1$flag <- ifelse(df1$year < 2013 &
df1$year == df2$year &
as.character(df1$code) == as.character(df2$code), 'Y', df1$flag)
Is equivalent to:
for(i in 1:nrow(df1)) {
if(df1$year[i] < 2013 & df1$year[i] == df2$year[i] &
as.character(df1$code[i]) == as.character(df2$code[i]))
df1$flag[i] <- 'Y'
}
Assuming that df1 and df2 have the same number of rows.
Update
This is a case for a merge and not for loop or if else. Basically, you want to merge the data sets on year and code and then if the year is less than 2013 assign a 'Y' to flag.
So, I added an identifier to df2 like this:
year <- c(2015, 2013, 2011, 2012, 2016, 2016, 2010)
code <- c(5, 7, 3, 2, 14, 99, 10)
flag2 <- 'Y'
#make sure the flags are not factors
df2 <- data.frame(year, code, flag2, stringsAsFactors = FALSE)
And then you just do:
#merge on year and code
newdf <- merge(df1, df2, by = c('year', 'code'), all.x = TRUE)
#assign Y to flag if year < 2013 and flag2 == Y
newdf$flag[newdf$year < 2013 & newdf$flag2 == 'Y'] <- 'Y'
#delete flag2
newdf$flag2 <- NULL
newdf
Out
year code flag
1 2010 10 Y
2 2011 1 N
3 2011 3 Y
4 2012 2 Y
5 2013 4 N
6 2014 5 N
7 2015 8 N
8 2016 6 N
9 2016 7 N
10 2016 9 N
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With