The data set contains three variables: id, sex, and grade (factor).
mydata <- data.frame(id=c(1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4), sex=c(1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1),
grade=c("a","b","c","d","e", "x","y","y","x", "q","q","q","q", "a", "a", "a", NA, "b"))
For each ID, I need to see how many unique grades we have and then create a new column (call N) to record the grade frequency. For instance, for ID=1, we have five unique values for "grade", so N = 4; for ID=2, we have two unique values for "grade", so N = 2; for ID=4, we have two unique values for "grade" (ignore NA), so N = 2.
The final data set is
mydata <- data.frame(id=c(1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4), sex=c(1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1),
grade=c("a","b","c","d","e", "x","y","y","x", "q","q","q","q", "a", "a", "a", NA, "b"))
mydata$N <- c(5,5,5,5,5,2,2,2,2,1,1,1,1,2,2,2,2,2)
New answer:
The uniqueN
-function of data.table has a na.rm
argument, which we can use as follows:
library(data.table)
setDT(mydata)[, n := uniqueN(grade, na.rm = TRUE), by = id]
which gives:
> mydata id sex grade n 1: 1 1 a 5 2: 1 1 b 5 3: 1 1 c 5 4: 1 1 d 5 5: 1 1 e 5 6: 2 0 x 2 7: 2 0 y 2 8: 2 0 y 2 9: 2 0 x 2 10: 3 0 q 1 11: 3 0 q 1 12: 3 0 q 1 13: 3 0 q 1 14: 4 1 a 2 15: 4 1 a 2 16: 4 1 a 2 17: 4 1 NA 2 18: 4 1 b 2
Old answer:
With data.table you could do this as follows:
library(data.table)
setDT(mydata)[, n := uniqueN(grade[!is.na(grade)]), by = id]
or:
setDT(mydata)[, n := uniqueN(na.omit(grade)), by = id]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With