Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Detect and Mark Change within a Column in Another Column

I'm trying to mark when a process starts and ends. The code needs to detect when the change begins and when it ends, marking it so in another column.

Example data:

date  process 
2007     0            
2008     1
2009     1
2010     1
2011     1
2012     1
2013     0

Goal:

date  process        Status
2007     0             NA
2008     1        Process_START
2009     1             NA
2010     1             NA
2011     1             NA
2012     1        Process_END
2013     0             NA
like image 760
lnNoam Avatar asked May 31 '15 18:05

lnNoam


People also ask

How do I find values from one column in another column and highlight in Excel?

Example using VLOOKUP You can check if the values in column A exist in column B using VLOOKUP. Select cell C2 by clicking on it. Insert the formula in “=IF(ISERROR(VLOOKUP(A2,$B$2:$B$1001,1,FALSE)),FALSE,TRUE)” the formula bar. Press Enter to assign the formula to C2.

How do you see if a value in one column exists in another column?

Check if a value exists in a column using VLOOKUP IF the value is found in that column then it returns the value as a result. Otherwise, it returns an #NA error. Suppose we want to check if a value exists in a column using the VLOOKUP function then return its related value from another column.


2 Answers

Maybe by calculating diff and lagging it in both directions:

dif <- diff(df1$process)
df1$Status <- factor(c(NA, dif) - 2 * c(dif, NA), levels = -3:3)
levels(df1$Status) <- c(rep(NA, 4), "Start", "End", "Start&End")
#   date process Status
# 1 2007       0   <NA>
# 2 2008       1  Start
# 3 2009       1   <NA>
# 4 2010       1   <NA>
# 5 2011       1   <NA>
# 6 2012       1    End
# 7 2013       0   <NA>

Update

Version without factors:

dif <- diff(df1$process)
df1$Status <- c(NA, dif) - 2 * c(dif, NA)
df1$Status <- c(rep(NA,4), "Start", "End", "Start&End")[df1$Status + 4]

Note that in case of a single year process you have a "Start & End" situation.

Update 2

If the series starts (or ends) with process = 1 the expected output might not be NA but Start (or End):

dif <- diff(df1$process)
df1$Status <- c(df1$process[1], dif) - 2 * c(dif, -tail(df1$process,1))
df1$Status <- c(rep(NA,4), "Start", "End", "Start&End")[df1$Status + 4]

More complicated example:

set.seed(4)
df1 <- data.frame(date = 2007:(2007+24), process = sample(c(0,1, 1), 25, TRUE))

The last version produces:

#    date process    Status
# 1  2007       1 Start&End
# 2  2008       0      <NA>
# 3  2009       0      <NA>
# 4  2010       0      <NA>
# 5  2011       1 Start&End
# 6  2012       0      <NA>
# 7  2013       1     Start
# 8  2014       1      <NA>
# 9  2015       1       End
# 10 2016       0      <NA>
# 11 2017       1 Start&End
# 12 2018       0      <NA>
# 13 2019       0      <NA>
# 14 2020       1     Start
# 15 2021       1      <NA>
# 16 2022       1      <NA>
# 17 2023       1      <NA>
# 18 2024       1      <NA>
# 19 2025       1      <NA>
# 20 2026       1      <NA>
# 21 2027       1      <NA>
# 22 2028       1      <NA>
# 23 2029       1      <NA>
# 24 2030       1      <NA>
# 25 2031       1       End
like image 162
bergant Avatar answered Sep 21 '22 13:09

bergant


One option with data.table

library(data.table)#v1.9.5+
setDT(df1)[, gr:= rleid(process)][,Status:=NA_character_][process==1, 
   Status:=replace(Status, 1:.N %in% c(1, .N), c('Process_START', 
      'Process_END')) , gr][,gr:= NULL]
#     date process        Status
# 1: 2007       0            NA
# 2: 2008       1 Process_START
# 3: 2009       1            NA
# 4: 2010       1            NA
# 5: 2011       1            NA
# 6: 2012       1   Process_END
# 7: 2013       0            NA

Update

Or a modification would be

 setDT(df1)[, gr:= rleid(process)][process==1L, Status:=c(NA, 
'Process_START', 'Process_END', 'Process_START_END')[(1:.N==1L) +
        2*(1:.N==.N)+1] , gr][,gr:=NULL]
  #   date process        Status
  #1: 2007       0            NA
  #2: 2008       1 Process_START
  #3: 2009       1            NA
  #4: 2010       1            NA
  #5: 2011       1            NA
  #6: 2012       1   Process_END
  #7: 2013       0            NA

Using the example from @David Arenburg's comment

 setDT(df1)[, gr:= rleid(process)][process==1L, Status:=c(NA, 
    'Process_START', 'Process_END', 'Process_START_END')[(1:.N==1L) + 
       2*(1:.N==.N)+1] , gr][,gr:=NULL]
  #   date process        Status
  #1: 2007       0            NA
  #2: 2008       1 Process_START
  #3: 2009       1            NA
  #4: 2010       1            NA
  #5: 2011       1            NA
  #6: 2012       1   Process_END
  #7: 2013       0            NA
  #8: 2013       0            NA
  #9: 2013       1 Process_START
  #10:2013       1   Process_END
  #11:2013       0            NA
  #12:2013       1 Process_START
  #13:2013       1   Process_END

And for the complicated example in @bergant's post

  setDT(df1)[, gr:= rleid(process)][process==1L, Status:=c(NA, 
    'Process_START', 'Process_END', 'Process_START_END')[(1:.N==1L) + 
       2*(1:.N==.N)+1] , gr][,gr:=NULL]
 #    date process gr            Status
 # 1: 2007       1  1 Process_START_END
 # 2: 2008       0  2                NA
 # 3: 2009       0  2                NA
 # 4: 2010       0  2                NA
 # 5: 2011       1  3 Process_START_END
 # 6: 2012       0  4                NA
 # 7: 2013       1  5     Process_START
 # 8: 2014       1  5                NA
 # 9: 2015       1  5       Process_END
 #10: 2016       0  6                NA
 #11: 2017       1  7 Process_START_END
 #12: 2018       0  8                NA
 #13: 2019       0  8                NA
 #14: 2020       1  9     Process_START
 #15: 2021       1  9                NA
 #16: 2022       1  9                NA
 #17: 2023       1  9                NA
 #18: 2024       1  9                NA
 #19: 2025       1  9                NA
 #20: 2026       1  9                NA
 #21: 2027       1  9                NA
 #22: 2028       1  9                NA
 #23: 2029       1  9                NA
 #24: 2030       1  9                NA
 #25: 2031       1  9       Process_END
like image 40
akrun Avatar answered Sep 21 '22 13:09

akrun