Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex \x96 -like characters

I have a few strings in a data set that contain the caharacters

\x96
\x92

and others.

I cant figure out how to grep for them in R.
I have tried using

pattern="\x96"
pattern="\\x96"
pattern="x96"

but to no avail.

Is there a specific way of dealing with such characters, specifically in R.


** UPDATE ** as per the suggestion in the comments, perl=TRUE allows the grep to work

Can anyone offer a solid explanation of what is going on?

session info, in case relevant

> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C            LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C        LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_0.9.3    RMySQL_0.9-3     DBI_0.2-5        stringr_0.6.1    data.table_1.8.6
like image 952
Ricardo Saporta Avatar asked Nov 03 '22 04:11

Ricardo Saporta


1 Answers

R supports several different types of regular expressions. The default is POSIX ERE (extended regular expressions), which is the default in grep and other standard posix tools. But the POSIX ERE engine in R does not currently support escaping hex character codes:

Escaping non-metacharacters with a backslash is implementation-dependent. The current implementation interprets \a as BEL, \e as ESC, \f as FF, \n as LF, \r as CR and \t as TAB. (Note that these will be interpreted by R's parser in literal character strings.)

See Regular Expressions as used in R.

Setting perl=TRUE changes the engine used by R to process regular expressions to PCRE (perl-compatible regular expressions). PCRE supports escaped hex character codes -- and voila, your regex now works.

like image 177
dpkp Avatar answered Nov 15 '22 07:11

dpkp