Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading csv file with Japanese characters into R

I am struggling to have R read in a csv file which has some of its columns in standard English characters, some numerical and some fields in Japanese characters.Here is how the data looks like:

category,desc,otherdesc,volume
UPC - 31401 Age Itameabura,かどや製油 純白ごま油,OIL_OTHERS_SML_ECO,83.0
UPC - 31401 Age Itameabura,オレインリッチ,OIL_OTHERS_MED,137.0
UPC - 31401 Age Itameabura,TVキャノーラ油,OIL_CANOLA_OTHERS_LRG,3026.0 

Keeping the R's language setting as English, the japanese characters are converted into some gibberish. When I change the language setting in R to Japanese, Sys.setlocale("LC_CTYPE", "japanese"), I see the file is not read in at all. R gives an error saying:

Error in make.names(col.names, unique = TRUE) : invalid multibyte string at 'サcategory'

I have no clue what's wrong with my csv file or the header names. Can you guide me as to how can I go about reading this csv file into R so that everything is displayed just as they do in the csv file?

Thanks! Vish

like image 503
user2895779 Avatar asked Oct 18 '13 17:10

user2895779


People also ask

What encoding to use for Japanese characters?

Character encodings. There are several standard methods to encode Japanese characters for use on a computer, including JIS, Shift-JIS, EUC, and Unicode.

How will you read a .CSV file in R language?

Reading a CSV file The contents of a CSV file can be read as a data frame in R using the read. csv(…) function. The CSV file to be read should be either present in the current working directory or the directory should be set accordingly using the setwd(…)


1 Answers

For japanese the below works for me:

df <- read.csv("your_file.csv", fileEncoding="cp932")

like image 165
MarKo9 Avatar answered Oct 26 '22 16:10

MarKo9