Confused with the locale settings in R

Tags:

regex

r

Just now I answered this Removing characters after a EURO symbol in R question. But it's not working for me where the r code works for others who are on Ubuntu.

This is my code.

x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
euro <- "\u20AC"
gsub(paste(euro , "(\\S+)|."), "\\1", x)
# ""

I think this is all about changing the locale settings, I don't know how to do that.

I'm running rstudio on Windows 8.

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

loaded via a namespace (and not attached):
[1] tools_3.2.0

@Anada's answer is good but we need to add that encoding parameter for every time when we use unicodes in regex. Is there any way to modify the default encoding to utf-8 on Windows?

995

asked Jul 08 '15 09:07

Avinash Raj

1 Answers

Seems to be a problem with encoding.

Consider:

x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
gsub(paste(euro , "(\\S+)|."), "\\1", x)
# [1] ""
gsub(paste(euro , "(\\S+)|."), "\\1", `Encoding<-`(x, "UTF8"))
# [1] "15,896.80"

136

answered Sep 30 '22 08:09

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                Get CSS value as written in stylesheet with jQuery or RegEx
                            
                                CMake regex match directories in list
                            
                                Generate random string based on Regex?
                            
                                Python: Getting text of a Regex match
                            
                                Java regex error - Look-behind with group reference
                            
                                Optimizing a regular expression to parse chinese pinyin [closed]
                            
                                Weird regex in inherit.js (by John Resig) - why, what and how? [duplicate]
                            
                                How do you reject a string if preceded by another string using standard POSIX regex?
                            
                                Repeatable, complex regular expression, with dot '.' delimited separators
                            
                                looping through scan and replacing matches individually
                            
                                awk FPAT variable: Working
                            
                                Detect and alter strings in PDFs
                            
                                Regular expression to match only if there are N unique characters
                            
                                Exclude strings of pattern "abba"
                            
                                preg_match :print: class matches tab character
                            
                                Regex match non-greedy on one optional string and greedy on another
                            
                                split line via regex in javascript?
                            
                                Remove variable wrapped in function from model formula in R
                            
                                Use Pandas string method 'contains' on a Series containing lists of strings
                            
                                is_date() is malfunctioning

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With