Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex for preserving case pattern, capitalization

Tags:

regex

r

Is there a regex for preserving case pattern in the vein of \U and \L?

In the example below, I want to convert "date" to "month" while maintaining the capitalization used in the input

   from        to
  "date" ~~> "month"
  "Date" ~~> "Month"
  "DATE" ~~> "MONTH"

I currently use three nested calls to sub to accomplish this.

input <- c("date", "Date", "DATE")
expected.out <- c("month", "Month", "MONTH")

sub("date", "month", 
  sub("Date", "Month", 
    sub("DATE", "MONTH", input)
  )
)

The goal is to have a single pattern and a single replace such as

gsub("(date)", "\\Umonth", input, perl=TRUE) 

which will yield the desired output

like image 309
Ricardo Saporta Avatar asked Oct 02 '14 23:10

Ricardo Saporta


People also ask

Are regex matches case sensitive?

By default, the comparison of an input string with any literal characters in a regular expression pattern is case-sensitive, white space in a regular expression pattern is interpreted as literal white-space characters, and capturing groups in a regular expression are named implicitly as well as explicitly.

What is the regex for special characters?

Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "."


2 Answers

Using the gsubfn package, you could avoid using nested sub functions and do this in one call.

> library(gsubfn)
> x <- 'Here we have a date, a different Date, and a DATE'
> gsubfn('date', list('date'='month','Date'='Month','DATE'='MONTH'), x, ignore.case=T)
# [1] "Here we have a month, a different Month, and a MONTH"
like image 38
hwnd Avatar answered Sep 24 '22 19:09

hwnd


This is one of those occasions when I think a for loop is justified:

input <- rep("Here are a date, a Date, and a DATE",2)
pat <- c("date", "Date", "DATE")
ret <- c("month", "Month", "MONTH")

for(i in seq_along(pat)) { input <- gsub(pat[i],ret[i],input) }
input
#[1] "Here are a month, a Month, and a MONTH" 
#[2] "Here are a month, a Month, and a MONTH"

And an alternative courtesy of @flodel implementing the same logic as the loop through Reduce:

Reduce(function(str, args) gsub(args[1], args[2], str), 
       Map(c, pat, ret), init = input)

For some benchmarking of these options, see @TylerRinker's answer.

like image 144
thelatemail Avatar answered Sep 22 '22 19:09

thelatemail