I am trying to read in data from a csv file and specify the encoding of the characters to be UTF-8. From reading through the ?read.csv() instructions, it seems that fileEncoding set equal to UTF-8 should accomplish this, however, I am not seeing that when checking. Is there a better way to specify the encoding of character strings to be UTF-8 when importing the data?
Sample Data:
Download Sample Data here
fruit<- read.csv("fruit.csv", header = TRUE, fileEncoding = "UTF-8")
fruit[] <- lapply(fruit, as.character)
Encoding(fruit$Fruit)
The output is "uknown" but I would expect this to be "UTF-8". What is the best way to ensure all imported characters are UTF-8? Thank you.
UTF-8, or "Unicode Transformation Format, 8 Bit" is a marketing operations pro's best friend when it comes to data imports and exports. It refers to how a file's character data is encoded when moving files between systems.
One easy way to change excel ANSI encoding to UTF-8 is the open the . csv file in notepad then select File > Save As. Now at the bottom you will see encoding is set to ANSI change it to UTF-8 and save the file as a new file and then you're done.
fruit <- read.csv("fruit.csv", header = TRUE)
fruit[] <- lapply(fruit, as.character)
fruit$Fruit <- paste0(fruit$Fruit, "\xfcmlaut") # Get non-ASCII char and jam it in!
Encoding(fruit$Fruit)
[1] "latin1" "latin1" "latin1"
fruit$Fruit <- enc2utf8(fruit$Fruit)
Encoding(fruit$Fruit)
[1] "UTF-8" "UTF-8" "UTF-8"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With