I have a data frame with two string variables with an equal number of characters. These strings represent a student responses for some exam. The first string contains a + sign for each question answered correctly and the incorrect response for each incorrect item. The second string contains all the correct answers. I want to replace all the + signs in the first string with the correct answer from the second string. A simplified heuristic data set can be created with this code:
df <- data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"),
v2 = c("DBBAD", "BDCAD","CDCCA"), stringsAsFactors = FALSE)
So the + signs in df$v1
need to be replaced w/ the letters in df$v2
that are the same distance from the start of the string. Any ideas?
When df$v1
and df$v2
are characters we may use
regmatches(df$v1, gregexpr("\\+", df$v1)) <- regmatches(df$v2, gregexpr("\\+", df$v1))
That is,
df <- data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"),
v2 = c("DBBAD", "BDCAD", "CDCCA"),
stringsAsFactors = FALSE)
rg <- gregexpr("\\+", df$v1)
regmatches(df$v1, rg) <- regmatches(df$v2, rg)
df
# v1 v2
# 1 DAAAB DBBAD
# 2 DDCCC BDCAD
# 3 ADBAD CDCCA
rg
contains the positions of "+" in df$v1
, and we conveniently exploit regmatches
to replace those matches in df$v1
with whatever is in df$v2
at the same positions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With