Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Returning a true value if it is a close match among different columns in r

Tags:

r

The data is as follows :

a <- c('id1','id2','id3','id4','id5')
b <- c(5,10,7,2,3)
d <- c(5.2,150,123,5,7)
e <- c(5.4,0,10,3,5)

df1 <- data.frame(a,b,d,e)

I want to create a new column in this data frame returning TRUE and FALSE. It should be true if all the values are within 5% difference of each other, else false.

For example, for 'id1' the values are 5,5.2,5.4 respectively for b,d and e column. So all these are within 5% of each other hence the new_col should be true. For 'id2' the values are 10,150,0 respectively for b,d and e column.So, they are not with 5% of each other, hence it should be false.

Desired Output

enter image description here

like image 237
Rohan Bali Avatar asked Nov 24 '20 05:11

Rohan Bali


People also ask

How do I check if one column matches another in R?

We can compare two columns in R by using ifelse(). This statement is used to check the condition given and return the data accordingly.

How do I get corresponding values in R?

To find the row corresponding to a nearest value in an R data frame, we can use which. min function after getting the absolute difference between the value and the column along with single square brackets for subsetting the row.

Can you use T instead of true in R?

R users can use T and F instead of TRUE and FALSE when they want to write logical values, but R output is always the long version, TRUE and FALSE.

Which of the following operators in R can be used for the value matching?

The %in% operator is used for matching values. “returns a vector of the positions of (first) matches of its first argument in its second”.


3 Answers

This looks at 1.05 times the minimum values is less than the 0.95 times the maximum value for each of the rows. (I assumed that's what you meant by within %5 of each other.)

sapply(1:nrow(df1), function(i) (min(df1[i, 2:4]) * 1.05) > 
     (0.95 * max(df1[i, 2:4])))
# [1]  TRUE FALSE FALSE FALSE FALSE

Slightly different way to do the same.

sapply(1:nrow(df1), function(i) diff(range(df1[i, 2:4]) * 
    c(1.05, 0.95)) <= 0)
# [1]  TRUE FALSE FALSE FALSE FALSE
like image 82
Suren Avatar answered Oct 21 '22 17:10

Suren


Does this work:

library(dplyr)
library(data.table)
df1 %>% rowwise() %>% mutate(new_col = case_when(between(d, 0.95*b, 1.05*b) & between(e, 0.95*d, 1.05*d) ~ 'TRUE', TRUE ~ 'FALSE'))
# A tibble: 5 x 5
# Rowwise: 
  a         b     d     e new_col
  <chr> <dbl> <dbl> <dbl> <chr>  
1 id1       5   5.2   5.4 TRUE   
2 id2      10 150     0   FALSE  
3 id3       7 123    10   FALSE  
4 id4       2   5     3   FALSE  
5 id5       3   7     5   FALSE  
like image 40
Karthik S Avatar answered Oct 21 '22 16:10

Karthik S


Is this what you're after?

a <- c('id1','id2','id3','id4','id5')
b <- c(5,10,7,2,3)
d <- c(5.2,150,123,5,7)
e <- c(5.4,0,10,3,5)

df1 <- data.frame(a,b,d,e)
library(tidyverse)
df1 %>% 
  mutate(new_col = ifelse((b >= (0.95 * d) & b <= (1.05 * d) & d >= (0.95 * e) & d <= (1.05 * e)),
                          "TRUE", "FALSE"))

    a  b     d    e new_col
1 id1  5   5.2  5.4    TRUE
2 id2 10 150.0  0.0   FALSE
3 id3  7 123.0 10.0   FALSE
4 id4  2   5.0  3.0   FALSE
5 id5  3   7.0  5.0   FALSE
like image 34
jared_mamrot Avatar answered Oct 21 '22 17:10

jared_mamrot