Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check previous row value with present row value in dataframe

Tags:

r

How to check previous row value with present row value dynamically for all column of data frame by grouping specific ID.

my data frame:

ID  ITEM1     ITEM2    ITEM3           
  1    A         A        A     
  2    C         B        C       
  1    A         B        C
  1    B         A        C    
  2    NA        B        F      
  3    A         A        D     
  4    R         G        J 
  4    H         T        J

For Ex:

ID  ITEM1     ITEM2    ITEM3    ITEM1change  ITEM2change   ITEM3change                   
  1    A         A        A       0            0             0          
  1    A         B        C       0            1             1
  1    B         A        C       1            1             0 
  2    C         B        C       0            0             0
  2    NA        B        F       1            0             1
  3    A         A        D       0            0             0
  4    R         G        J       0            0             0
  4    H         T        J       1            1             0

My final output will be:

  Fiels         modifiedcout   unmodifiedcount Total      
  ITEM1change     3               5              8                
  ITEM2change     3               5              8    
  ITEM3change     2               6              8

my data:

structure(list(ID = c(1, 2, 1, 1, 2, 3, 4, 4), ITEM1 = structure(c(1L, 
3L, 1L, 2L, NA, 1L, 5L, 4L), .Label = c("A", "B", "C", "H", "R"
), class = "factor"), ITEM2 = structure(c(1L, 2L, 2L, 1L, 2L, 
1L, 3L, 4L), .Label = c("A", "B", "G", "T"), class = "factor"), 
    ITEM3 = structure(c(1L, 2L, 2L, 2L, 4L, 3L, 5L, 5L), .Label = c("A", 
    "C", "D", "F", "J"), class = "factor")), .Names = c("ID", 
"ITEM1", "ITEM2", "ITEM3"), row.names = c(NA, -8L), class = "data.frame")
like image 209
udya Avatar asked Jan 30 '23 15:01

udya


1 Answers

A possible solution:

library(dplyr)
library(tidyr)

df %>% 
  gather(item, value, -1) %>% 
  group_by(ID, item) %>% 
  mutate(change = lag(value, default = first(value)) != value,
         change = replace(change, is.na(change), TRUE)) %>% 
  group_by(item) %>% 
  summarise(modified = sum(change, na.rm = TRUE),
            unmodified = sum(!change, na.rm = TRUE)) %>% 
  mutate(total = modified + unmodified)

which gives:

# A tibble: 3 x 4
   item modified unmodified total
  <chr>    <int>      <int> <int>
1 ITEM1        3          5     8
2 ITEM2        3          5     8
3 ITEM3        2          6     8
like image 66
Jaap Avatar answered Feb 01 '23 09:02

Jaap