Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression"\\|" in strsplit

Tags:

string

regex

r

>str= "AAC|Australia Acquisition Corp. - Ordinary Shares|S|N|D|100"
> strsplit(str,"\\|")
[[1]]
[1] "AAC"                                          
[2] "Australia Acquisition Corp. - Ordinary Shares"
[3] "S"                                            
[4] "N"                                            
[5] "D"                                            
[6] "100"   

I wonder \\| is equal to | ?
maybe \\|is equal to \| ,
why can strsplit(str,"\\|") work?

like image 550
Fnzh Xx Avatar asked Nov 30 '22 02:11

Fnzh Xx


2 Answers

Since

|

has a special meaning in reg-exps it needs to be escaped, so to match | the actual regular expression is

\|

Since \ in turn is a special character when declaring string literals (you probably recognize it from \n etc.), the \ needs to be escaped itself. I.e., in order to create a string literal containing \| you need

\\|
like image 88
aioobe Avatar answered Dec 01 '22 14:12

aioobe


Because it's a quoted string. In a quoted string, you can include a " character by escaping it with a \. A \ itself then also needs to be escaped to be a single literal backslash. So your quoted string means: \|.

Now in a regular expression a | is a special character that is not matched literally unless it is escaped. Regular Expressions in R also escape with a backslash, so the string literal "\\|" means the string \| which is an expression matching exactly |. Why "\\|" works is because that means matching exactly | which appears as the separator in the string you're splitting.

A more specific reference to regular expressions in R might be handy, but it, as many do, references perl regular expressions.

like image 39
dlamblin Avatar answered Dec 01 '22 14:12

dlamblin