I'm trying to mark when a process starts and ends. The code needs to detect when the change begins and when it ends, marking it so in another column.
Example data:
date process
2007 0
2008 1
2009 1
2010 1
2011 1
2012 1
2013 0
Goal:
date process Status
2007 0 NA
2008 1 Process_START
2009 1 NA
2010 1 NA
2011 1 NA
2012 1 Process_END
2013 0 NA
Example using VLOOKUP You can check if the values in column A exist in column B using VLOOKUP. Select cell C2 by clicking on it. Insert the formula in “=IF(ISERROR(VLOOKUP(A2,$B$2:$B$1001,1,FALSE)),FALSE,TRUE)” the formula bar. Press Enter to assign the formula to C2.
Check if a value exists in a column using VLOOKUP IF the value is found in that column then it returns the value as a result. Otherwise, it returns an #NA error. Suppose we want to check if a value exists in a column using the VLOOKUP function then return its related value from another column.
Maybe by calculating diff
and lagging it in both directions:
dif <- diff(df1$process)
df1$Status <- factor(c(NA, dif) - 2 * c(dif, NA), levels = -3:3)
levels(df1$Status) <- c(rep(NA, 4), "Start", "End", "Start&End")
# date process Status
# 1 2007 0 <NA>
# 2 2008 1 Start
# 3 2009 1 <NA>
# 4 2010 1 <NA>
# 5 2011 1 <NA>
# 6 2012 1 End
# 7 2013 0 <NA>
Version without factors:
dif <- diff(df1$process)
df1$Status <- c(NA, dif) - 2 * c(dif, NA)
df1$Status <- c(rep(NA,4), "Start", "End", "Start&End")[df1$Status + 4]
Note that in case of a single year process you have a "Start & End" situation.
If the series starts (or ends) with process = 1 the expected output might not be NA but Start (or End):
dif <- diff(df1$process)
df1$Status <- c(df1$process[1], dif) - 2 * c(dif, -tail(df1$process,1))
df1$Status <- c(rep(NA,4), "Start", "End", "Start&End")[df1$Status + 4]
More complicated example:
set.seed(4)
df1 <- data.frame(date = 2007:(2007+24), process = sample(c(0,1, 1), 25, TRUE))
The last version produces:
# date process Status
# 1 2007 1 Start&End
# 2 2008 0 <NA>
# 3 2009 0 <NA>
# 4 2010 0 <NA>
# 5 2011 1 Start&End
# 6 2012 0 <NA>
# 7 2013 1 Start
# 8 2014 1 <NA>
# 9 2015 1 End
# 10 2016 0 <NA>
# 11 2017 1 Start&End
# 12 2018 0 <NA>
# 13 2019 0 <NA>
# 14 2020 1 Start
# 15 2021 1 <NA>
# 16 2022 1 <NA>
# 17 2023 1 <NA>
# 18 2024 1 <NA>
# 19 2025 1 <NA>
# 20 2026 1 <NA>
# 21 2027 1 <NA>
# 22 2028 1 <NA>
# 23 2029 1 <NA>
# 24 2030 1 <NA>
# 25 2031 1 End
One option with data.table
library(data.table)#v1.9.5+
setDT(df1)[, gr:= rleid(process)][,Status:=NA_character_][process==1,
Status:=replace(Status, 1:.N %in% c(1, .N), c('Process_START',
'Process_END')) , gr][,gr:= NULL]
# date process Status
# 1: 2007 0 NA
# 2: 2008 1 Process_START
# 3: 2009 1 NA
# 4: 2010 1 NA
# 5: 2011 1 NA
# 6: 2012 1 Process_END
# 7: 2013 0 NA
Or a modification would be
setDT(df1)[, gr:= rleid(process)][process==1L, Status:=c(NA,
'Process_START', 'Process_END', 'Process_START_END')[(1:.N==1L) +
2*(1:.N==.N)+1] , gr][,gr:=NULL]
# date process Status
#1: 2007 0 NA
#2: 2008 1 Process_START
#3: 2009 1 NA
#4: 2010 1 NA
#5: 2011 1 NA
#6: 2012 1 Process_END
#7: 2013 0 NA
Using the example from @David Arenburg's comment
setDT(df1)[, gr:= rleid(process)][process==1L, Status:=c(NA,
'Process_START', 'Process_END', 'Process_START_END')[(1:.N==1L) +
2*(1:.N==.N)+1] , gr][,gr:=NULL]
# date process Status
#1: 2007 0 NA
#2: 2008 1 Process_START
#3: 2009 1 NA
#4: 2010 1 NA
#5: 2011 1 NA
#6: 2012 1 Process_END
#7: 2013 0 NA
#8: 2013 0 NA
#9: 2013 1 Process_START
#10:2013 1 Process_END
#11:2013 0 NA
#12:2013 1 Process_START
#13:2013 1 Process_END
And for the complicated example in @bergant's post
setDT(df1)[, gr:= rleid(process)][process==1L, Status:=c(NA,
'Process_START', 'Process_END', 'Process_START_END')[(1:.N==1L) +
2*(1:.N==.N)+1] , gr][,gr:=NULL]
# date process gr Status
# 1: 2007 1 1 Process_START_END
# 2: 2008 0 2 NA
# 3: 2009 0 2 NA
# 4: 2010 0 2 NA
# 5: 2011 1 3 Process_START_END
# 6: 2012 0 4 NA
# 7: 2013 1 5 Process_START
# 8: 2014 1 5 NA
# 9: 2015 1 5 Process_END
#10: 2016 0 6 NA
#11: 2017 1 7 Process_START_END
#12: 2018 0 8 NA
#13: 2019 0 8 NA
#14: 2020 1 9 Process_START
#15: 2021 1 9 NA
#16: 2022 1 9 NA
#17: 2023 1 9 NA
#18: 2024 1 9 NA
#19: 2025 1 9 NA
#20: 2026 1 9 NA
#21: 2027 1 9 NA
#22: 2028 1 9 NA
#23: 2029 1 9 NA
#24: 2030 1 9 NA
#25: 2031 1 9 Process_END
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With