I've got a bunch of csv
files that I'm reading into R and including in a package/data folder in .rdata
format. Unfortunately the non-ASCII characters in the data fail the check. The tools
package has two functions to check for non-ASCII characters (showNonASCII
and showNonASCIIfile
) but I can't seem to locate one to remove/clean them.
Before I explore other UNIX tools, it would be great to do this all in R so I can maintain a complete workflow from raw data to final product. Are there any existing packages/functions to help me get rid of the non-ASCII characters?
Use . replace() method to replace the Non-ASCII characters with the empty string.
Step 1: Click on any cell (D3). Enter Formula =CLEAN(C3). Step 2: Click ENTER. It removes non-printable characters.
These days, a slightly better approach is to use the stringi package which provides a function for general unicode conversion. This allows you to preserve the original text as much as possible:
x <- c("Ekstr\u00f8m", "J\u00f6reskog", "bi\u00dfchen Z\u00fcrcher") x #> [1] "Ekstrøm" "Jöreskog" "bißchen Zürcher" stringi::stri_trans_general(x, "latin-ascii") #> [1] "Ekstrom" "Joreskog" "bisschen Zurcher"
To simply remove the non-ASCII characters, you could use base R's iconv()
, setting sub = ""
. Something like this should work:
x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher") # e.g. from ?iconv Encoding(x) <- "latin1" # (just to make sure) x # [1] "Ekstrøm" "Jöreskog" "bißchen Zürcher" iconv(x, "latin1", "ASCII", sub="") # [1] "Ekstrm" "Jreskog" "bichen Zrcher"
To locate non-ASCII characters, or to find if there were any at all in your files, you could likely adapt the following ideas:
## Do *any* lines contain non-ASCII characters? any(grepl("I_WAS_NOT_ASCII", iconv(x, "latin1", "ASCII", sub="I_WAS_NOT_ASCII"))) [1] TRUE ## Find which lines (e.g. read in by readLines()) contain non-ASCII characters grep("I_WAS_NOT_ASCII", iconv(x, "latin1", "ASCII", sub="I_WAS_NOT_ASCII")) [1] 1 2 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With