Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sprintf in R doesn't count umlauts

I have vector of characters and I want to make sure all elements of the vector have the same length. Hence I fill short elements up with spaces, like this:

vec <- c("fjdlksa01dada","rau","sjklf")
x <- sprintf("%-15s", vec)
nchar(x)
# returns
[1] 15 15 15

like answers to my previous question suggested. This is fine but it seems to have trouble with umlauts. For example if my vector looks like this:

vec2 <- c("fjdlksa01dada","rauü","sjklf")
y <- sprintf("%-15s", vec)
nchar(y)
# returns
[1] 15 14 15

I am running R on Mac OS X (10.6). How can I fix this?

EDIT: Note, I am not looking to fix the output of nchar because it is correct. The problem is that sprintf looses the umlaut.

EDIT: Update R, changed to DWins locale - no change at all. But:

vec2 <- c("fjdlksa01dada","rauü","sjklf")
Encoding(vec2)
# returns
[1] "unknown" "UTF-8"   "unknown"

strange.

like image 957
Matt Bannert Avatar asked Nov 04 '22 05:11

Matt Bannert


1 Answers

I found this on the ?sprintf page:

If any element of fmt or any character argument is declared as UTF-8, the element of the result will be in UTF-8 and have the encoding declared as UTF-8. Otherwise it will be in the current locale's encoding.

The input takes its locale from Rgui's locale (i think); see below.

On windows it fortunately already prints:

> vec2 <- c("fjdlksa01dada","rauü","sjklf")
> y <- sprintf("%-15s", vec)
> nchar(y)
[1] 15 15 15

I think on MacOs you can achieve this with opening R like the following, but i dont have any Mac here to actually test this:

Rgui --encoding=utf-8
like image 148
Bernd Elkemann Avatar answered Nov 07 '22 22:11

Bernd Elkemann