Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Double Colon in R Regular Expression

Tags:

regex

r

The goal is to remove all non-capital letter in a string and I managed to find a regular expression solution without fully understanding it.

> gsub("[^::A-Z::]","", "PendingApproved")
[1] "PA"

I tried to read the documentation of regex in R but the double colon isn't really covered there.

[]includes characters to match in regex, A-Z means upper case and ^ means not, can someone help me understand what are the double colons there?

like image 762
B.Mr.W. Avatar asked Mar 05 '23 17:03

B.Mr.W.


1 Answers

As far as I know, you don't need those double colons:

gsub("[^A-Z]", "", "PendingApproved")
[1] "PA"

Your current pattern says to remove any character which is not A-Z or colon :. The fact that you repeat the colons twice, on each side of the character range, does not add any extra logic.

Perhaps the author of the code you are using confounded the double colons with R's regex own syntax for named character classes. For example, we could have written the above as:

gsub("[^[:upper:]]","", "PendingApproved")

where [:upper:] means all upper case letters.

Demo

like image 93
Tim Biegeleisen Avatar answered Mar 16 '23 13:03

Tim Biegeleisen