Here's a data frame containing a column of user ids:
> head(df)
uid
1 14070210
2 14080815
3 14091420
For the sake of argument, I want to create a new column containing the square root of the user id, and another new column containing a hash of the user id. So I do this:
df_mutated <- df %>%
mutate(sqrt_uid = sqrt(uid), hashed_uid = digest(uid))
... where digest() comes from the digest package.
While the square root appears to work, the digest function returns the same value for each user id.
> head(df_mutated)
uid sqrt_uid hashed_uid
1 14070210 3751.028 f8c4b39403e57d85cd1698d2353954d0
2 14080815 3752.441 f8c4b39403e57d85cd1698d2353954d0
3 14091420 3753.854 f8c4b39403e57d85cd1698d2353954d0
This is weird to me. Without dplyr, the digest() function returns different values for different inputs. What am I not understanding about dplyr?
Thanks
The digest()
function isn't vectorized. So if you pass in a vector, you get one value for the whole vector rather than a digest for each element of the vector. Since it returns one value, that value is recycled for each row of your data.frame. You can create your own vectorized version
vdigest <- Vectorize(digest)
df %>% mutate(sqrt_uid = sqrt(uid), hashed_uid = vdigest(uid))
# uid sqrt_uid hashed_uid
# 1 14070210 3751.028 cc90019421220a24f75b5ed5daec36ff
# 2 14080815 3752.441 9f7f643940b692dd9c7effad439547e8
# 3 14091420 3753.854 89e6666fdfdbfb532b2d7940def9d47d
which matches what you get when you pass in each vector element individually
digest(df$uid[1])
# [1] "cc90019421220a24f75b5ed5daec36ff"
digest(df$uid[3])
# [1] "89e6666fdfdbfb532b2d7940def9d47d"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With