I apologize for the wording of the question and the errors. Newbie in OS and in R.
Problem: Find efficient way to fill column with numbers that uniquely identify observations with same value in another column. Result would look like this:
patient_number id
1 46 1
2 47 2
3 15 3
4 42 4
5 33 5
6 26 6
7 37 7
8 7 8
9 33 5
10 36 9
Sample data frame
set.seed(42)
df <- data.frame(
patient_number = sample(seq(1, 50, 1), 100, replace = TRUE)
)
What I was able to come up with
df$id <- NA ## create id and fill with NA make if statement easier
n_unique <- length(unique(df$patient_number)) ## how many unique obs
for (i in 1:nrow(df)) {
index_identical <- which(df$patient_number == df$patient_number[i])
## get index of obs with same patient_number
if (any(is.na(df$id[index_identical]))) {
## if any of the ids of obs with same patient number not filled in,
df$id[index_identical] <- setdiff(seq(1, n_unique, 1), df$id)[1]
## get a integer between 1 and the number of unique obs that is not used
}
else {
df$id <- df$id
}
}
It does the job, but with thousands of rows, it takes time.
Thanks for bearing with me.
If you're open to other packages, you can use the group_indices
function from the dplyr
package:
library(dplyr)
df %>%
mutate(id = group_indices(., patient_number))
patient_number id
1 46 40
2 47 41
3 15 14
4 42 37
5 33 28
6 26 23
7 37 32
8 7 6
9 33 28
10 36 31
11 23 21
12 36 31
13 47 41
...
We can use .GRP
from data.table
library(data.table)
setDT(df)[, id := .GRP, patient_number]
Or with base R
match
and factor
options are fast as well
df$id <- with(df, match(patient_number, unique(patient_number)))
df$id <- with(df, as.integer(factor(patient_number,
levels = unique(patient_number))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With