I often conduct research on human participants. For various reasons my preliminary identifier is sometimes a composite of information that reduces anonymity in the data (e.g., I might concatenate a string that include date and time of completion, IP address, and some information supplied by the participant).
Thus, if the data is to be shared in some form, a cleansed ID needs to be created from the preliminary ID. The cleansed ID needs to be stripped of such information. A simple approach in R is just to assign consecutive numbers (e.g., df$id <- seq(nrow(df))
where df
is the data.frame).
However, if in the initial phase of research more data is collected or the rows are resorted, this can cause problems. I.e., the cleansed ID assigned to a given participant may vary each time the raw dataset is updated. This in turn can break subsequent analyses on the cleansed dataset that for example may have filtered cases based on cleansed ID.
Thus, I thought about creating a hash using the digest
function in the digest
package.
df$id <- sapply(df$raw_id, digest)
This would seem to lead to a reliable way of going from raw identifier to cleansed identifier, but it would be impossible to get the raw identifier for anyone who only possessed the cleansed identifier.
However, given that I am new to both the digest
function and hashing in general, I wanted to ask:
digest
suitable for stripping IDs of identifying information? digest
for this purpose?Anonymization can be performed via a range of techniques, including encryption, term or character shuffling, or dictionary substitution.
I have learnt many helpful things from the comments above. This answer aims to distill these comments.
There are two issues with hashing for the purpose of anonymising research participant identifiers:
Thus, to summarise the recommendations that I've gathered.
library(digest)
hashed_id <- function(x, salt) {
y <- paste(x, salt)
y <- sapply(y, function(X) digest(X, algo="sha1"))
as.character(y)
}
mydata$id <- hashed_id(mydata$raw_id, "somesalt1234")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With