Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional gsub replacement

Tags:

r

I have text data (in R) and want to replace some characters with other characters in a data frame. I thought this would be an easy task using strsplit on spaces and create a vector that can I can then use matching (%in%) which can then be pasted back together. But then I thought about punctuation. There's no space between the last word of a sentence and the punctuation at the end.

I figure there's probably a more simple way to achieve what I want than the convoluted mess that's becoming my code. I would appreciate direction with this problem.

#Character String
x <- "I like 346 ice cream cones.  They're 99 percent good!  I ate 46."

#Replacement Values Dataframe
  symbol text                     
1 "346"  "three hundred forty six"
2 "99"   "ninety nine"            
3 "46"   "forty six" 

#replacement dataframe
numDF <- 
data.frame(symbol = c("346","99", "46"),
           text = c("three hundred forty six", "ninety nine","forty six"),
           stringsAsFactors = FALSE)

Desired outcome:

[1] "I like three hundred forty six ice cream cones.  They're ninety nine percent good!  You ate forty six?")

EDIT: I originally entitled this conditional gsub because that what it seems like to me even though there is no gsub involved.

like image 233
Tyler Rinker Avatar asked Jan 02 '12 17:01

Tyler Rinker


1 Answers

Maybe this, inspired by Josh O'Brien's answer, does it:

x <- "I like 346 ice cream cones.  They're 99 percent good!  I ate 46."
numDF <- structure(c("346", "99", "46", "three hundred forty six", "ninety nine", 
"forty six"), .Dim = c(3L, 2L), .Dimnames = list(c("1", "2", 
"3"), c("symbol", "text")))

pat <-  paste(numDF[,"symbol"], collapse="|")
repeat {
    m <- regexpr(pat, x)
    if(m==-1) break
    sym <- regmatches(x,m)
    regmatches(x,m) <- numDF[match(sym, numDF[,"symbol"]), "text"]
}
x
like image 92
Karsten W. Avatar answered Sep 30 '22 08:09

Karsten W.