Encoding problem when your package contains functions with non-english characters

Tags:

I am building my own package, and I keep running into encoding issues because the functions in my package has non-english (non-ASCII) characters.

Inherently, Korean characters are a part of many of the functions in my package. A sample function:

library(rvest)
sampleprob <- function(url) {
  # sample url: "http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20200330003851"
  result <- grepl("연결재무제표 주석", html_text(read_html(url)))
  return(result)
}

However, when installing the package I run into encoding problems.

I created a sample package (https://github.com/hyk0127/KorEncod/) with just one function (what is shown above) and uploaded it onto my github page for a reproducible example. I run the following code to install:

library(devtools)
install_github("hyk0127/KorEncod")

Below is the error message that I see

Error : (converted from warning) unable to re-encode 'hello.R' line 7
ERROR: unable to collate and parse R files for package 'KorEncod'
* removing 'C:/Users/myname/Documents/R/win-library/3.6/KorEncod'
* restoring previous 'C:/Users/myname/Documents/R/win-library/3.6/KorEncod'
Error: Failed to install 'KorEncod' from GitHub:
  (converted from warning) installation of package ‘C:/Users/myname/AppData/Local/Temp/RtmpmS5ZOe/file48c02d205c44/KorEncod_0.1.0.tar.gz’ had non-zero exit status

The error message about line 7 refers to the Korean characters in the function.

It is possible to locally install the package with tar.gz file, but then the function does not run as intended, because the Korean characters are recognized in broken encoding.

This cannot be the first time that someone has tried building a package that has non-english (or non-ASCII) characters, and yet I couldn't find a solution to this. Any help will be deeply appreciated.

A few pieces of info that I think are related:

Currently the DESCRIPTION file specifies "Encoding: UTF-8".

I have used sys.setlocale to set the locale into Korean and back to no avail. I have specified @encoding UTF-8 to the function to no avail as well.

I am currently using Windows where the administrative language is set to English. I have tried using a different laptop with Windows & administrative language set to Korean, and the same problem appears.

281

asked Feb 25 '21 02:02

Hong

Video Answer

2 Answers

The key trick is replacing the non-ASCII characters with their unicode codes - the \uxxxx encoding.

These can be generated via stringi::stri_escape_unicode() function.

Note that since it will be necessary to completely get rid of the Korean characters in your code in order to pass the R CMD check it will be necessary to perform a manual copy & re-encode via {stringi} on the command line & paste back operation on all your R scripts included in the package.

I am not aware of an available automated solution for this problem.

In the specific use case of the example provided the unicode would read like this:

sampleprob <- function(url) {
  # stringi::stri_escape_unicode("연결재무제표 주석") to get the \uxxxx codes
  result <- grepl("\uc5f0\uacb0\uc7ac\ubb34\uc81c\ud45c \uc8fc\uc11d", 
                  rvest::html_text(xml2::read_html(url)))
  return(result)
}
sampleprob("http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20200330003851")
[1] TRUE

This will be a hassle, but it seems to be the only way to make your code platform neutral (which is a key CRAN requirement, and thus subject to R CMD check).

154

answered Oct 22 '22 05:10

Jindra Lacko

Adding for the future value (for those facing similar problems), you can also solve this problem by saving the non-ASCII characters in a data file, then loading the value & using it.

So save the character as a data file (using standard package folder names and roxygen2 package)

# In your package, save as a separate file within .\data-raw 
kor_chrs <- list(sampleprob = "연결재무제표 주석")
usethis::use_data(kor_chrs)

Then in your functions load the data and use them.

# This is your R file for the function within ./R folder
#' @importFrom rvest html_text
#' @importFrom xml2  read_html
#' @export
sampleprob <- function(url) {
  # sample url: "http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20200330003851"
  result <- grepl(kor_chrs$sampleprob[1], html_text(read_html(url)))
  return(result)
}

This, yes, is still a workaround, but it runs in Windows machines without any troubles.

answered Oct 22 '22 06:10

Hong

Related questions
                            
                                Warning jsonlite in shiny: Input to asJSON(keep_vec_names=TRUE) is a named vector
                            
                                googleUser.getAuthResponse().id_token does not return id_token in shiny
                            
                                Align multiple legends with patchwork
                            
                                SQL Server machine learning services r version 3.5
                            
                                Citations in DT:datatable
                            
                                caret rpart decision tree plotting result
                            
                                Change column names in dataframe based on matching to another dataframe by dplyr
                            
                                Running a power analysis on a lavaan latent growth curve model
                            
                                Zero-Inflation NB - object 'model_count' not found
                            
                                Does R 4.0.0. make it possible to define foo"(...)" operators, similar to the newly introduced r"(...)" syntax?
                            
                                How do you install dplyr-snowflakedb and rJava on Amazon Linux?
                            
                                How should I format across rows of a gt table efficiently in R?
                            
                                How to use a lasso with the Vars package in R
                            
                                How to read very large files line by line matching patterns in R
                            
                                "recipe for target 'projectit.o' failed" while installing rgdal
                            
                                Installing Packages Takes A Very Long Time on Ubuntu
                            
                                Shift x-axis ticks with scale_x_datetime and geom_col so bars are NOT centered on tick mark
                            
                                Pass an R package on CRAN with issues on MACOS due + OpenMP
                            
                                How to order primary Y axis within label in ggplot
                            
                                Correcting for robust/clustered standard errors within the lm function or replacing the results

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Encoding problem when your package contains functions with non-english characters

Tags:

package

r

encoding

roxygen

Hong

People also ask

Video Answer

2 Answers

Jindra Lacko

Hong

Recent Activity

Donate For Us