I am trying to import a csv that is in Japanese. This code:
url <- 'http://www.mof.go.jp/international_policy/reference/itn_transactions_in_securities/week.csv' x <- read.csv(url, header=FALSE, stringsAsFactors=FALSE)
returns the following error:
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) : invalid multibyte string at '<91>ΊO<8b>y<82>ёΓ<e0><8f>،<94><94><84><94><83><8c>_<96>̏@(<8f>T<8e><9f><81>E<8e>w<92><e8><95>@<8a>փx<81>[<83>X<81>j'
I tried changing the encoding (Encoding(url) <- 'UTF-8'
and also to latin1) and tried removing the read.csv parameters, but received the same "invalid multibyte string" message in each case. Is there a different encoding that should be used, or is there some other problem?
A multibyte-string is one which uses more than one byte to store each character (probably a Unicode string).
A null-terminated multibyte string (NTMBS), or "multibyte string", is a sequence of nonzero bytes followed by a byte with value zero (the terminating null character). Each character stored in the string may occupy more than one byte.
To load a. csv file into the current script and operate with it, use the read. csv() method in base R. The output is delivered as a data frame, with row numbers given to integers starting at 1.
Before you can use the read_csv function, you have to load readr, the R package that houses read_csv.
Encoding
sets the encoding of a character string. It doesn't set the encoding of the file represented by the character string, which is what you want.
This worked for me, after trying "UTF-8"
:
x <- read.csv(url, header=FALSE, stringsAsFactors=FALSE, fileEncoding="latin1")
And you may want to skip the first 16 lines, and read in the headers separately. Either way, there's still quite a bit of cleaning up to do.
x <- read.csv(url, header=FALSE, stringsAsFactors=FALSE, fileEncoding="latin1", skip=16) # get started with the clean-up x[,1] <- gsub("\u0081|`", "", x[,1]) # get rid of odd characters x[,-1] <- as.data.frame(lapply(x[,-1], # convert to numbers function(d) type.convert(gsub(d, pattern=",", replace=""))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With