I have a dataset that has student ids and name of major along with semester. The main complicating factor, is that if a student returns to a previous major, I want it to count as another major
StudID   Major       Term
12345    History     202101
12345    History     202102
12345    Management  202201
12345    History     202202
Desired result
StudID   Major       Term    MajorNumber
12345    History     202101  1
12345    History     202102  1
12345    Management  202201  2
12345    History     202202  3
The only thing I have tried is pulling in the earliest term for each major then doing a ranking within StudentID based on each majors earliest term, but that means that final "history" becomes a 1 instead of 3
The data.table::rleid function does this. Specifically, it will give all consecutive values that are the same the same ID, and will increment whenever there is a change.
With data.table,
library(data.table)
your_dt = as.data.table(your_data)
your_dt[, MajorNumber := rleid(Major), by = .(StudID)]
your_dt
#    StudID      Major   Term MajorNumber
# 1:  12345    History 202101           1
# 2:  12345    History 202102           1
# 3:  12345 Management 202201           2
# 4:  12345    History 202202           3
Or with dplyr:
library(dplyr)
your_data %>%
  group_by(StudID) %>%
  mutate(MajorNumber = data.table::rleid(Major)) %>%
  ungroup()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With