Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

the difference between `\\s|*` and `\\s|[*]` in regular expression in r?

Tags:

regex

r

What is the difference between \\s|* and \\s|[*] in regular expression in r?

> gsub('\\s|*','','Aug 2013*')
[1] "Aug2013*"
> gsub('\\s|[*]','','Aug 2013*')
[1] "Aug2013"

What is the function of [ ] here?

like image 847
showkey Avatar asked Nov 03 '13 02:11

showkey


1 Answers

The first expression is invalid in the way you are using it, hence * is a special character. If you want to use sub or gsub this way with special characters, you can use fixed = TRUE parameter set.

This takes the string representing the pattern being search for as it is and ignores any special characters.

See Pattern Matching and Replacement in the R documentation.

x <- 'Aug 2013****'
gsub('*', '', x, fixed=TRUE)
#[1] "Aug 2013"

Your second expression is just using a character class [] for * to avoid escaping, the same as..

x <- 'Aug 2013*'
gsub('\\s|\\*', '', x)
#[1] "Aug2013"

As far as the explanation of your first expression: \\s|*

\s      whitespace (\n, \r, \t, \f, and " ")
|       OR

And the second expression: \\s|[*]

\s      whitespace (\n, \r, \t, \f, and " ")
|       OR
[*]     any character of: '*'
like image 180
hwnd Avatar answered Oct 06 '22 00:10

hwnd