Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

utf-8 in ggplot axis labels

Tags:

I'm struggling to get ggplot to display axis labels correctly when in a non standard character set [Russian]. when using such strings, ggplot labels axes e.g.

\ U+0441 U+043D U+0433

Ggplot gets the encoding right when I save the names as a separate variable and plot these as labels using geom_text()

converting the formatting of the dataframe doesn't help much either: db$variable=sapply(db$variable,function(row) iconv(row,to='UTF-8')) results in scrambled characters, presumably because the data is already encoded as UTF-8 in the data frame

I can make this work by using a custom axis using scale_x_discrete( labels=names) but this is a bit unwieldy, especially when the data has missing values. Is there any way to get ggplot to display these characters correctly in the first place?

edit

After some head scratching it seems

Sys.setlocale("LC_CTYPE","russian")

will solve the problem. I still don't really understand why R/ggplot is inconsistent about under what circumstances it will accept UTF8 code, though. In the example above the problem was limited to axis labels. Is this because for axis labels the string was fetched from a data-table, which somehow deals with encoding differently to if the same lines are stored in a string or matrix?

like image 788
Rolf Fredheim Avatar asked Dec 11 '12 12:12

Rolf Fredheim


1 Answers

I guess this has been solved in the most recent version of ggplot.

    library(tidyverse)
library(ggrepel)

russian_names<-structure(list(rowname = c("Мазда RX4", "Мазда RX4 Вагон", "Датсун 710", 
                                          "Хорнет 4 Drive", "Хорнет Sportabout", "Валиант", "Дастер 360", 
                                          "Мерседес 240D", "Мерседес 230", "Мерседес 280", "Мерседес 280C", "Мерседес 450SE", 
                                          "Мерседес 450SL", "Мерседес 450SLC", "Кадиллак Флитвуд", "Линкольн Континенталь", 
                                          "Крайслер Империал", "Фиат 128", "Хонда Сивик", "Тойота Королла", 
                                          "Тойота Корона", "Додж Чаленджер", "ЭйЭмСи Джавелин", "Камаро Z28", 
                                          "Понтиак Файербёрд", "Фиат X1-9", "Порш 914-2", "Лотус Европа", 
                                          "Форд Пантера L", "Феррари Дино", "Мазерати Бора", "Вольво 142E"
)), row.names = c(NA, -32L), class = "data.frame", .Names = "rowname")

mtcars %>% bind_cols(russian_names)  %>% 
  ggplot(mapping=aes(x=mpg, y=disp))+
  geom_point()+
  geom_label_repel(aes(label=russian_names), size=2)+
  labs(x="Миль на галлон",
       y="Замещение, куб.дюйм")

Results in a proper plot:

enter image description here

like image 182
dmi3kno Avatar answered Sep 22 '22 12:09

dmi3kno