I am struggling to have R read in a csv file which has some of its columns in standard English characters, some numerical and some fields in Japanese characters.Here is how the data looks like:
category,desc,otherdesc,volume
UPC - 31401 Age Itameabura,かどや製油 純白ごま油,OIL_OTHERS_SML_ECO,83.0
UPC - 31401 Age Itameabura,オレインリッチ,OIL_OTHERS_MED,137.0
UPC - 31401 Age Itameabura,TVキャノーラ油,OIL_CANOLA_OTHERS_LRG,3026.0
Keeping the R's language setting as English, the japanese characters are converted into some gibberish. When I change the language setting in R to Japanese, Sys.setlocale("LC_CTYPE", "japanese")
, I see the file is not read in at all. R gives an error saying:
Error in make.names(col.names, unique = TRUE) : invalid multibyte string at 'サcategory'
I have no clue what's wrong with my csv file or the header names. Can you guide me as to how can I go about reading this csv file into R so that everything is displayed just as they do in the csv file?
Thanks! Vish
Character encodings. There are several standard methods to encode Japanese characters for use on a computer, including JIS, Shift-JIS, EUC, and Unicode.
Reading a CSV file The contents of a CSV file can be read as a data frame in R using the read. csv(…) function. The CSV file to be read should be either present in the current working directory or the directory should be set accordingly using the setwd(…)
For japanese the below works for me:
df <- read.csv("your_file.csv", fileEncoding="cp932")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With