Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confused with the locale settings in R

Tags:

regex

r

Just now I answered this Removing characters after a EURO symbol in R question. But it's not working for me where the r code works for others who are on Ubuntu.

This is my code.

x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
euro <- "\u20AC"
gsub(paste(euro , "(\\S+)|."), "\\1", x)
# "" 

I think this is all about changing the locale settings, I don't know how to do that.

I'm running rstudio on Windows 8.

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

loaded via a namespace (and not attached):
[1] tools_3.2.0

@Anada's answer is good but we need to add that encoding parameter for every time when we use unicodes in regex. Is there any way to modify the default encoding to utf-8 on Windows?

like image 995
Avinash Raj Avatar asked Jul 08 '15 09:07

Avinash Raj


People also ask

What does locale mean in R?

The locale describes aspects of the internationalization of a program. Initially most aspects of the locale of R are set to "C" (which is the default for the C language and reflects North-American usage – also known as "POSIX" ).


1 Answers

Seems to be a problem with encoding.

Consider:

x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
gsub(paste(euro , "(\\S+)|."), "\\1", x)
# [1] ""
gsub(paste(euro , "(\\S+)|."), "\\1", `Encoding<-`(x, "UTF8"))
# [1] "15,896.80"
like image 136
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 30 '22 08:09

A5C1D2H2I1M1N2O1R2T1