How can I detect non-ascii characters in a vector of strings in a grep like fashion. For example below I'd like to return c(1, 3)
or c(TRUE, FALSE, TRUE, FALSE)
:
x <- c("façile test of showNonASCII(): details{", "This is a good line", "This has an ümlaut in it.", "OK again. }")
Attempt:
y <- tools::showNonASCII(x) str(y) p <- capture.output(tools::showNonASCII(x))
The isascii() function returns a boolean value where True indicates that the string contains all ASCII characters and False indicates that the string contains some non-ASCII characters.
To identify the Non Unicode characters we can use either Google Chrome or Mozilla firefox browser by just dragging and dropping the file to the browser. Chrome will show us only the row and column number of the .
Came across this later using pure base regex and so simple:
grepl("[^ -~]", x) ## [1] TRUE FALSE TRUE FALSE
More here: http://www.catonmat.net/blog/my-favorite-regex/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With