Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Change character encoding of columns in data frame

I'm investigating how the character encoding affects sorting. My question here is:

How I can change a single column of a data frame to a different character encoding?

For context, I will include several extra steps at the bottom.

1) Create the data frame:

d.enc <- data.frame( utf8 = c(" ", "_ ", " _"), 
                     mac = c(" ", "_ ", " _"), 
                     label = c("space", "underscore space", "space underscore") )

2) Convert to character vectors and attempt to set encoding:

d.enc2$utf8 <- as.character(d.enc$utf8)
d.enc2$mac <- as.character(d.enc$mac)
d.enc2$label <- as.character(d.enc$label)

Encoding(d.enc2$utf8) <- "UTF-8"
Encoding(d.enc2$mac) <- "MACINTOSH"
Encoding(d.enc2$utf8)
# [1] "unknown" "unknown" "unknown"
Encoding(d.enc2$mac)
# [1] "unknown" "unknown" "unknown"

3) That's not what I was hoping for. I would have expected:

# [1] "UTF-8" "UTF-8" "UTF-8" and
# [1] "MACINTOSH" "MACINTOSH" "MACINTOSH"

4) Are my desired encodings supported? (Running on a mac)

temp <- iconvlist()
temp[399]
# [1] "UTF-8"
temp[338]
# [1] "MACINTOSH"

Seems that they are supported.

5) Once I can change the encodings, I would like to do the following to see how the sorting order changes:

library(dplyr)
arrange(d.enc2, desc(utf8))
arrange(d.enc2, desc(mac))

6) I expect the output will look something like this but in a different order depending on which column is used for the sorting:

  utf8 mac            label
1   _   _  underscore space
2    _   _ space underscore
3                     space

Thanks for any tips!

like image 767
Bobby Avatar asked Mar 14 '16 12:03

Bobby


1 Answers

Maybe late, but I saw this at: R- Changing encoding of column in dataframe?

for (col in colnames(mydataframe)){
  Encoding(mydataframe[[col]]) <- "UTF-8"}
like image 144
TropicalMagic Avatar answered Nov 15 '22 04:11

TropicalMagic