Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

detect non ascii characters in a string

Tags:

r

How can I detect non-ascii characters in a vector of strings in a grep like fashion. For example below I'd like to return c(1, 3) or c(TRUE, FALSE, TRUE, FALSE):

x <- c("façile test of showNonASCII(): details{",      "This is a good line", "This has an ümlaut in it.", "OK again. }") 

Attempt:

y <- tools::showNonASCII(x) str(y) p <- capture.output(tools::showNonASCII(x)) 
like image 854
Tyler Rinker Avatar asked Jan 05 '16 14:01

Tyler Rinker


People also ask

How do I find a non-ASCII character in a string?

The isascii() function returns a boolean value where True indicates that the string contains all ASCII characters and False indicates that the string contains some non-ASCII characters.

How do I find a non Unicode character?

To identify the Non Unicode characters we can use either Google Chrome or Mozilla firefox browser by just dragging and dropping the file to the browser. Chrome will show us only the row and column number of the .


1 Answers

Came across this later using pure base regex and so simple:

grepl("[^ -~]", x) ## [1]  TRUE FALSE  TRUE FALSE 

More here: http://www.catonmat.net/blog/my-favorite-regex/

like image 133
Tyler Rinker Avatar answered Sep 30 '22 22:09

Tyler Rinker