Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using shorthand character classes inside character classes in R regex

Tags:

regex

r

gsub

I have defined

vec <- "5f 110y, Fast"

and

gsub("[\\s0-9a-z]+,", "", vec)

gives "5f Fast"

I would have expected it to give "Fast" since everything before the comma should get matched by the regex.

Can anyone explain to me why this is not the case?

like image 226
ThanksABundle Avatar asked Jul 19 '18 11:07

ThanksABundle


People also ask

What does \d do in RegEx?

\d (digit) matches any single digit (same as [0-9] ). The uppercase counterpart \D (non-digit) matches any single character that is not a digit (same as [^0-9] ). \s (space) matches any single whitespace (same as [ \t\n\r\f] , blank, tab, newline, carriage-return and form-feed).

What do means by \D \W and \S shorthand character classes signify in regular expressions?

What do the \d, \w, and \s shorthand character classes signify in regular expressions? The \d, \w, and \s shorthand character classes match a single digit, word, or space character, respectively.

What does \s mean in RegEx?

\s stands for “whitespace character”. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n\f]. That is: \s matches a space, a tab, a carriage return, a line feed, or a form feed.


2 Answers

You should keep in mind that, in TRE regex patterns, you cannot use regex escapes like \s, \d, \w inside bracket expressions.

So, the regex in your case, "[\\s0-9a-z]+,", matches 1 or more \, s, digits and lowercase ASCII letters, and then a single ,.

You may use POSIX character classes instead, like [:space:] (any whitespaces) or [:blank:] (horizontal whitespaces):

> gsub("[[:space:]0-9a-z]+,", "", vec)
[1] " Fast"

Or, use a PCRE regex with \s and perl=TRUE argument:

> gsub("[\\s0-9a-z]+,", "", vec, perl=TRUE)
[1] " Fast"

To make \s match all Unicode whitespaces, add (*UCP) PCRE verb at the pattern start: gsub("(*UCP)[\\s0-9a-z]+,", "", vec, perl=TRUE).

like image 188
Wiktor Stribiżew Avatar answered Sep 18 '22 14:09

Wiktor Stribiżew


Could you please try folllowing and let me know if this helps you.

vec <- c("5f 110y, Fast")
gsub(".*,","",vec)

OR

gsub("[[:alnum:]]+ [[:alnum:]]+,","",vec)
like image 43
RavinderSingh13 Avatar answered Sep 20 '22 14:09

RavinderSingh13