I have vector of characters and I want to make sure all elements of the vector have the same length. Hence I fill short elements up with spaces, like this:
vec <- c("fjdlksa01dada","rau","sjklf")
x <- sprintf("%-15s", vec)
nchar(x)
# returns
[1] 15 15 15
like answers to my previous question suggested. This is fine but it seems to have trouble with umlauts. For example if my vector looks like this:
vec2 <- c("fjdlksa01dada","rauü","sjklf")
y <- sprintf("%-15s", vec)
nchar(y)
# returns
[1] 15 14 15
I am running R on Mac OS X (10.6). How can I fix this?
EDIT: Note, I am not looking to fix the output of nchar because it is correct. The problem is that sprintf looses the umlaut.
EDIT: Update R, changed to DWins locale - no change at all. But:
vec2 <- c("fjdlksa01dada","rauü","sjklf")
Encoding(vec2)
# returns
[1] "unknown" "UTF-8" "unknown"
strange.
I found this on the ?sprintf
page:
If any element of fmt or any character argument is declared as UTF-8, the element of the result will be in UTF-8 and have the encoding declared as UTF-8. Otherwise it will be in the current locale's encoding.
The input takes its locale from Rgui's locale (i think); see below.
On windows it fortunately already prints:
> vec2 <- c("fjdlksa01dada","rauü","sjklf")
> y <- sprintf("%-15s", vec)
> nchar(y)
[1] 15 15 15
I think on MacOs you can achieve this with opening R like the following, but i dont have any Mac here to actually test this:
Rgui --encoding=utf-8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With