Is this a bug?
> nchar(sprintf("%-20s", "Sao Paulo"))
[1] 20
> nchar(sprintf("%-20s", "São Paulo"))
[1] 19
> sessionInfo()
R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.2.4 fortunes_1.5-2
> nchar(sprintf("%-20s", "Sao Paulo"), type = "bytes")
[1] 20
> nchar(sprintf("%-20s", "São Paulo"), type = "bytes")
[1] 20
If you read the help page of sprintf, it talks about the fact Encodings are important. If you look at the help page of nchar, you also learn that there are different types.
As a consequence, I see the following (on Linux, R 3.3.0 beta):
> nchars <- function(x) vapply(c("bytes","chars","width"),
function(typ) nchar(x, type=typ), 1)
> sp <- "São Paulo"
> Encoding(sp)
[1] "UTF-8"
> nchars(sp)
bytes chars width
10 9 9
> nchars(sprintf("%-20s", sp))
bytes chars width
20 19 19
>
So I'm claiming there is no bug at all. I'm not saying much more than @TheRimalaya but am drawing a different conclusion
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With