I have text data (in R) and want to replace some characters with other characters in a data frame. I thought this would be an easy task using strsplit on spaces and create a vector that can I can then use matching (%in%) which can then be pasted back together. But then I thought about punctuation. There's no space between the last word of a sentence and the punctuation at the end.
I figure there's probably a more simple way to achieve what I want than the convoluted mess that's becoming my code. I would appreciate direction with this problem.
#Character String
x <- "I like 346 ice cream cones. They're 99 percent good! I ate 46."
#Replacement Values Dataframe
symbol text
1 "346" "three hundred forty six"
2 "99" "ninety nine"
3 "46" "forty six"
#replacement dataframe
numDF <-
data.frame(symbol = c("346","99", "46"),
text = c("three hundred forty six", "ninety nine","forty six"),
stringsAsFactors = FALSE)
Desired outcome:
[1] "I like three hundred forty six ice cream cones. They're ninety nine percent good! You ate forty six?")
EDIT: I originally entitled this conditional gsub because that what it seems like to me even though there is no gsub involved.
Maybe this, inspired by Josh O'Brien's answer, does it:
x <- "I like 346 ice cream cones. They're 99 percent good! I ate 46."
numDF <- structure(c("346", "99", "46", "three hundred forty six", "ninety nine",
"forty six"), .Dim = c(3L, 2L), .Dimnames = list(c("1", "2",
"3"), c("symbol", "text")))
pat <- paste(numDF[,"symbol"], collapse="|")
repeat {
m <- regexpr(pat, x)
if(m==-1) break
sym <- regmatches(x,m)
regmatches(x,m) <- numDF[match(sym, numDF[,"symbol"]), "text"]
}
x
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With