I have a dataframe (df) that looks like this:
School Student Year
A 10 1999
A 10 2000
A 20 1999
A 20 2000
A 20 2001
B 10 1999
B 10 2000
And I would like to create a person ID
column so that df looks like this:
ID School Student Year
1 A 10 1999
1 A 10 2000
2 A 20 1999
2 A 20 2000
2 A 20 2001
3 B 10 1999
3 B 10 2000
In other words, the ID
variable indicates which person it is in the dataset, accounting for both Student number and School membership (here we have 3 students total).
I did df$ID <- df$Student
and tried to request the value +1 if c("School", "Student)
was unique. It isn't working. Help appreciated.
1. type 1 into the cell which is adjacent to the first data you want to add ID number. 2. Then in the cell below it, type this formula =IF(B1=B2,A1,A1+1), press Enter key to get the first result, drag fill handle down until last data showing up.
In a database or spreadsheet, unique identifiers may be designated as a specific column or field to help make sorting and filtering through information easier. This also helps trace information back to a specific user or entity within the system.
We can do this in base R
without doing any group by operation
df$ID <- cumsum(!duplicated(df[1:2]))
df
# School Student Year ID
#1 A 10 1999 1
#2 A 10 2000 1
#3 A 20 1999 2
#4 A 20 2000 2
#5 A 20 2001 2
#6 B 10 1999 3
#7 B 10 2000 3
NOTE: Assuming that 'School' and 'Student' are ordered
Or using tidyverse
library(dplyr)
df %>%
mutate(ID = group_indices_(df, .dots=c("School", "Student")))
# School Student Year ID
#1 A 10 1999 1
#2 A 10 2000 1
#3 A 20 1999 2
#4 A 20 2000 2
#5 A 20 2001 2
#6 B 10 1999 3
#7 B 10 2000 3
As @radek mentioned, in the recent version (dplyr_0.8.0
), we get the notification that group_indices_
is deprecated, instead use group_indices
df %>%
mutate(ID = group_indices(., School, Student))
Group by School and Student, then assign group id to ID
variable.
library('data.table')
df[, ID := .GRP, by = .(School, Student)]
# School Student Year ID
# 1: A 10 1999 1
# 2: A 10 2000 1
# 3: A 20 1999 2
# 4: A 20 2000 2
# 5: A 20 2001 2
# 6: B 10 1999 3
# 7: B 10 2000 3
Data:
df <- fread('School Student Year
A 10 1999
A 10 2000
A 20 1999
A 20 2000
A 20 2001
B 10 1999
B 10 2000')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With