I would like to use R to compare written text and extract sections which differ between the elements.
Consider a
and b
two text paragraphs. One is a modified version of the other:
a <- "This part is the same. This part is old."
b <- "This string is updated. This part is the same."
I want to compare the two strings and receive the part of the string which is unique to either of the two as output, preferably separate for both input strings.
Expected output:
stringdiff <- list(a = " This part is old.", b = "This string is updated. ")
> stringdiff
$a
[1] " This part is old."
$b
[1] "This string is updated. "
I've tried a solution from Extract characters that differ between two strings, but this only compares unique characters. The answer in Simple Comparing of two texts in R comes closer, but still only compares unique words.
Is there any way to get the expected output without too much of a hassle?
We concatenate both the strings, split at the space after the .
to create a list
of sentences ('lst'), get the unique
elements from unlist
ing the 'lst' ('un1'), using setdiff
we get the elements that are not in 'un1'
lst <- strsplit(c(a= a, b = b), "(?<=[.])\\s", perl = TRUE)
un1 <- unique(unlist(lst))
lapply(lst, setdiff, x= un1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With