Is there a regex for preserving case pattern in the vein of \U
and \L
?
In the example below, I want to convert "date"
to "month"
while maintaining the capitalization used in the input
from to
"date" ~~> "month"
"Date" ~~> "Month"
"DATE" ~~> "MONTH"
I currently use three nested calls to sub
to accomplish this.
input <- c("date", "Date", "DATE")
expected.out <- c("month", "Month", "MONTH")
sub("date", "month",
sub("Date", "Month",
sub("DATE", "MONTH", input)
)
)
The goal is to have a single pattern
and a single replace
such as
gsub("(date)", "\\Umonth", input, perl=TRUE)
which will yield the desired output
By default, the comparison of an input string with any literal characters in a regular expression pattern is case-sensitive, white space in a regular expression pattern is interpreted as literal white-space characters, and capturing groups in a regular expression are named implicitly as well as explicitly.
Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "."
Using the gsubfn
package, you could avoid using nested sub functions and do this in one call.
> library(gsubfn)
> x <- 'Here we have a date, a different Date, and a DATE'
> gsubfn('date', list('date'='month','Date'='Month','DATE'='MONTH'), x, ignore.case=T)
# [1] "Here we have a month, a different Month, and a MONTH"
This is one of those occasions when I think a for
loop is justified:
input <- rep("Here are a date, a Date, and a DATE",2)
pat <- c("date", "Date", "DATE")
ret <- c("month", "Month", "MONTH")
for(i in seq_along(pat)) { input <- gsub(pat[i],ret[i],input) }
input
#[1] "Here are a month, a Month, and a MONTH"
#[2] "Here are a month, a Month, and a MONTH"
And an alternative courtesy of @flodel
implementing the same logic as the loop through Reduce
:
Reduce(function(str, args) gsub(args[1], args[2], str),
Map(c, pat, ret), init = input)
For some benchmarking of these options, see @TylerRinker's answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With