The data is as follows :
a <- c('id1','id2','id3','id4','id5')
b <- c(5,10,7,2,3)
d <- c(5.2,150,123,5,7)
e <- c(5.4,0,10,3,5)
df1 <- data.frame(a,b,d,e)
I want to create a new column in this data frame returning TRUE and FALSE. It should be true if all the values are within 5% difference of each other, else false.
For example, for 'id1' the values are 5,5.2,5.4 respectively for b,d and e column. So all these are within 5% of each other hence the new_col should be true. For 'id2' the values are 10,150,0 respectively for b,d and e column.So, they are not with 5% of each other, hence it should be false.
Desired Output
We can compare two columns in R by using ifelse(). This statement is used to check the condition given and return the data accordingly.
To find the row corresponding to a nearest value in an R data frame, we can use which. min function after getting the absolute difference between the value and the column along with single square brackets for subsetting the row.
R users can use T and F instead of TRUE and FALSE when they want to write logical values, but R output is always the long version, TRUE and FALSE.
The %in% operator is used for matching values. “returns a vector of the positions of (first) matches of its first argument in its second”.
This looks at 1.05 times the minimum values is less than the 0.95 times the maximum value for each of the rows. (I assumed that's what you meant by within %5 of each other.)
sapply(1:nrow(df1), function(i) (min(df1[i, 2:4]) * 1.05) >
(0.95 * max(df1[i, 2:4])))
# [1] TRUE FALSE FALSE FALSE FALSE
Slightly different way to do the same.
sapply(1:nrow(df1), function(i) diff(range(df1[i, 2:4]) *
c(1.05, 0.95)) <= 0)
# [1] TRUE FALSE FALSE FALSE FALSE
Does this work:
library(dplyr)
library(data.table)
df1 %>% rowwise() %>% mutate(new_col = case_when(between(d, 0.95*b, 1.05*b) & between(e, 0.95*d, 1.05*d) ~ 'TRUE', TRUE ~ 'FALSE'))
# A tibble: 5 x 5
# Rowwise:
a b d e new_col
<chr> <dbl> <dbl> <dbl> <chr>
1 id1 5 5.2 5.4 TRUE
2 id2 10 150 0 FALSE
3 id3 7 123 10 FALSE
4 id4 2 5 3 FALSE
5 id5 3 7 5 FALSE
Is this what you're after?
a <- c('id1','id2','id3','id4','id5')
b <- c(5,10,7,2,3)
d <- c(5.2,150,123,5,7)
e <- c(5.4,0,10,3,5)
df1 <- data.frame(a,b,d,e)
library(tidyverse)
df1 %>%
mutate(new_col = ifelse((b >= (0.95 * d) & b <= (1.05 * d) & d >= (0.95 * e) & d <= (1.05 * e)),
"TRUE", "FALSE"))
a b d e new_col
1 id1 5 5.2 5.4 TRUE
2 id2 10 150.0 0.0 FALSE
3 id3 7 123.0 10.0 FALSE
4 id4 2 5.0 3.0 FALSE
5 id5 3 7.0 5.0 FALSE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With