I am interested in de-identifying a sensitive data set with both time-fixed and time-variant values. I want to (a) group all cases by social security number, (b) assign those cases a unique ID and then (c) remove the social security number.
Here's an example data set:
personal_id gender temperature 111-11-1111 M 99.6 999-999-999 F 98.2 111-11-1111 M 97.8 999-999-999 F 98.3 888-88-8888 F 99.0 111-11-1111 M 98.9
Any solutions would be very much appreciated.
By using group_by() function from dplyr package we can perform group by on multiple columns or variables (two or more columns) and summarise on multiple columns for aggregations.
The group_by() function in R is from dplyr package that is used to group rows by column values in the DataFrame, It is similar to GROUP BY clause in SQL. R dplyr groupby is used to collect identical data into groups on DataFrame and perform aggregate functions on the grouped data.
An ID variable is a variable that identifies each entity in a dataset (person, household, etc) with a distinct value. This article lists five properties of ID variables that researchers should keep in mind when creating, collecting, and merging data.
dplyr
has a group_indices
function for creating unique group IDs
library(dplyr) data <- data.frame(personal_id = c("111-111-111", "999-999-999", "222-222-222", "111-111-111"), gender = c("M", "F", "M", "M"), temperature = c(99.6, 98.2, 97.8, 95.5)) data$group_id <- data %>% group_indices(personal_id) data <- data %>% select(-personal_id) data gender temperature group_id 1 M 99.6 1 2 F 98.2 3 3 M 97.8 2 4 M 95.5 1
Or within the same pipeline (https://github.com/tidyverse/dplyr/issues/2160):
data %>% mutate(group_id = group_indices(., personal_id))
dplyr::group_indices()
is deprecated as of dplyr 1.0.0
. dplyr::cur_group_id()
should be used instead:
df %>% group_by(personal_id) %>% mutate(group_id = cur_group_id()) personal_id gender temperature group_id <chr> <chr> <dbl> <int> 1 111-11-1111 M 99.6 1 2 999-999-999 F 98.2 3 3 111-11-1111 M 97.8 1 4 999-999-999 F 98.3 3 5 888-88-8888 F 99 2 6 111-11-1111 M 98.9 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With