Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

%in%, == or something else to compare multiple values

Tags:

r

vector

I think I'm still a little unclear on how R works on individual elements in vectorized statements.

I have the following code

df1$flag <- ifelse(df1$year < 2013 &
        df1$year == df2$year &
        as.character(df1$code) == as.character(df2$code), 'Y', df1$flag)

And I am operating on this data

year <- c(2011, 2012, 2011, 2013, 2014, 2016, 2016, 2015, 2016, 2010)
flag <- 'N'
code <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
df1 <- data.frame(year, flag, code)

rm(year)
rm(flag)
rm(code)

year <- c(2015, 2013, 2011, 2012, 2016, 2016, 2010)
code <- c(5, 7, 3, 2, 14, 99, 10)
df2 <- data.frame(year, code)

df1$flag <- ifelse(df1$year < 2013 &
                     df1$year == df2$year &
                     as.character(df1$code) == as.character(df2$code), 'Y', df1$flag)

I want this to be the output

> df1
   year flag code
1  2011    1    1
2  2012    Y    2
3  2011    Y    3
4  2013    1    4
5  2014    1    5
6  2016    1    6
7  2016    1    7
8  2015    1    8
9  2016    1    9
10 2010    Y   10

But instead I am getting this

> df1
   year flag code
1  2011    1    1
2  2012    1    2
3  2011    Y    3
4  2013    1    4
5  2014    1    5
6  2016    1    6
7  2016    1    7
8  2015    1    8
9  2016    1    9
10 2010    1   10

I want the ifelse statement to compare each element of df1$year and df1$code to each element of df2$year and df2$code, but it doesn't look like == or %in% will do that.

To put it another way, what I want is to compare the elements like this

for(i in 1:nrow(df1)) {
    for(z in 1:nrow(df2)) {
         if(df1$year[i] < 2013 & df1$year[i] == df2$year[z] & 
                            as.character(df1$code[i]) == as.character(df2$code[z]))
             df1$flag[i] <- 'Y'
    }
}

Obviously using for like this GREATLY slows down execution and cannot be used, but it doesn't seem like ==, %in%, identical() or all.equal() will do what I describe in the for loop either. How can I get the output I've described in R?

like image 354
jamzsabb Avatar asked Apr 21 '26 23:04

jamzsabb


1 Answers

ifelse compares two vectors element-wise (assuming they are of the same length, if not, the small vector will be recycled so that they are).

This means that your code:

df1$flag <- ifelse(df1$year < 2013 &
        df1$year == df2$year &
        as.character(df1$code) == as.character(df2$code), 'Y', df1$flag)

Is equivalent to:

for(i in 1:nrow(df1)) {
         if(df1$year[i] < 2013 & df1$year[i] == df2$year[i] & 
            as.character(df1$code[i]) == as.character(df2$code[i]))
           df1$flag[i] <- 'Y'
}

Assuming that df1 and df2 have the same number of rows.


Update

This is a case for a merge and not for loop or if else. Basically, you want to merge the data sets on year and code and then if the year is less than 2013 assign a 'Y' to flag.

So, I added an identifier to df2 like this:

year <- c(2015, 2013, 2011, 2012, 2016, 2016, 2010)
code <- c(5, 7, 3, 2, 14, 99, 10)
flag2 <- 'Y'
#make sure the flags are not factors
df2 <- data.frame(year, code, flag2, stringsAsFactors = FALSE)

And then you just do:

#merge on year and code
newdf <- merge(df1, df2, by = c('year', 'code'), all.x = TRUE)
#assign Y to flag if year < 2013 and flag2 == Y
newdf$flag[newdf$year < 2013 & newdf$flag2 == 'Y'] <- 'Y'
#delete flag2
newdf$flag2 <- NULL
newdf

Out

   year code flag
1  2010   10    Y
2  2011    1    N
3  2011    3    Y
4  2012    2    Y
5  2013    4    N
6  2014    5    N
7  2015    8    N
8  2016    6    N
9  2016    7    N
10 2016    9    N
like image 132
LyzandeR Avatar answered Apr 23 '26 14:04

LyzandeR



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!