I have a panel dataset ranging from 1992 to 2023 built from annual surveys.
The ID variable is NORDEST, and the period variable is PERIODO.
I want to create a new variable that takes the value 1 the first time a variable ADM_FEM increases by at least one unit, compared with the same variable in the previous year, within each NORDEST group. I've checked if ADM_FEM changes through time and it does, also that it is a numeric varieble as well as PERIODO.
Here’s my current code, but the new variable AUMENTO_PRIMERA_VEZ is tagging all observations as 0:
panel <- panel %>%
arrange(NORDEST, PERIODO) %>%
group_by(NORDEST) %>%
mutate(
ADM_FEM_LAG = lag(ADM_FEM),
aumento = !is.na(ADM_FEM_LAG) & (ADM_FEM > ADM_FEM_LAG),
AUMENTO_PRIMERA_VEZ = as.integer(aumento & !duplicated(cumsum(aumento)))
) %>%
ungroup()
What’s wrong with this logic? How can I correctly flag only the first increase in ADM_FEM for each NORDEST?
Here's a possible solution using data.table. As things are ordered by year I'm using diff to look at change from previous year and checking if this is greater than 1 unit.
library(data.table)
set.seed(123)
df = data.table(NORDEST = rep(c("A", "B"), each=32),
PERIODO = 1992:2023,
ADM_FEM = rnorm(64))
setorder(df, NORDEST, PERIODO)
first_inc <- function(x){
seq_along(x) == min(which(c(NA, diff(x)) > 1))
}
df[, first_increase := first_inc(ADM_FEM), by=.(NORDEST)]
and to do multiple columns you could use
df[, (paste0(cols, "_first_inc")) := lapply(.SD, first_inc),
by=.(NORDEST), .SDcols = cols]
where cols contains the relevant column names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With