I've got a data_frame that looks like this.
df <- data_frame(name = c('john','bill','amy'),
name.2 = c('johhn','ball','ammy'))
df
# A tibble: 3 x 2
name name.2
<chr> <chr>
1 john johhn
2 bill ball
3 amy ammy
I want to add a column that shows the difference between the two name(.2) columns. Like this:
df %>%
mutate(diff = c('h','a','m'))
# A tibble: 3 x 3
name name.2 diff
<chr> <chr> <chr>
1 john johhn h
2 bill ball a
3 amy ammy m
I'd prefer to find a solution that uses elements of tidyverse and stringr if possible, but I'll take it like I get it.
Using base R we canndo something like:
diffc=diag(attr(adist(df$name,df$name.2, counts = TRUE), "trafos"))
transform(df,diff=regmatches(name.2,regexpr("[^M]",diffc)))
name name.2 diff
1 john johhn h
2 bill ball a
3 amy ammy m
Breakdown:
compute approximate string distance between df[,1] and df[,2]
d=adist(df$name,df$name.2, counts = TRUE)
obtain the diagonal of the transformation matrix:
e= diag(attr(d, "trafos"))
Find the position of those that are either deleted,substituted or inserted ie not maintained:
f=regexpr("[^M]",e)
extract the values of df[,2] at those specified positions:
dat$diff==regmatches(name.2,f)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With