Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare and identify the missing rows

Tags:

dataframe

r

I would like to compare per row 2 df based on serial and day variables and to create a new column called compare to highlight the missing rows. How can this be done in R? I tried the inner_join function without success.

Sample structure df1 and df2

enter image description here

enter image description here

Desired output:

enter image description here

Sample data

df1<-structure(list(serial = c(1, 2, 3, 4, 5), day = c(1, 0, 1, 0, 
0)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-5L), spec = structure(list(cols = list(serial = structure(list(), class = c("collector_double", 
"collector")), day = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))



df2<-structure(list(serial = c(1, 2, 3, 4, 5, 5, 7), day = c(1, 0, 
1, 0, 0, 1, 1)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L), spec = structure(list(cols = list(
    serial = structure(list(), class = c("collector_double", 
    "collector")), day = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))
like image 739
Rfanatic Avatar asked Feb 05 '26 11:02

Rfanatic


2 Answers

One way would be to create a unique key combining the two columns and use %in% to find if the key is present in another dataset.

A base R option -

df2$compare <- do.call(paste, df2) %in% do.call(paste, df1)
df2

# A tibble: 7 x 3
#  serial   day compare
#   <dbl> <dbl> <lgl>  
#1      1     1 TRUE   
#2      2     0 TRUE   
#3      3     1 TRUE   
#4      4     0 TRUE   
#5      5     0 TRUE   
#6      5     1 FALSE  
#7      7     1 FALSE  

If there are more columns in your data apart from serial and day use the below code.

cols <- c('serial', 'day')
df2$compare <- do.call(paste, df2[cols]) %in% do.call(paste, df1[cols])
like image 117
Ronak Shah Avatar answered Feb 07 '26 02:02

Ronak Shah


We can use tidyverse

library(dplyr)
df2 %>%
    mutate(compare = TRUE) %>% 
   left_join(df1 %>%
      mutate(compare1 = TRUE), by = c('serial', 'day')) %>%
      transmute(serial, day, compare = (!is.na(compare1)))

-output

# A tibble: 7 x 3
  serial   day compare
   <dbl> <dbl> <lgl>  
1      1     1 TRUE   
2      2     0 TRUE   
3      3     1 TRUE   
4      4     0 TRUE   
5      5     0 TRUE   
6      5     1 FALSE  
7      7     1 FALSE  

Or with a faster and efficient data.table

library(data.table)
setDT(df2)[, compare := FALSE][setDT(df1), compare := TRUE, on = .(serial, day)]
like image 23
akrun Avatar answered Feb 07 '26 02:02

akrun