Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I would like to use gsub in R to match all items which are not alphanumeric

Tags:

regex

r

gsub

I am searching raw twitter snippets using R but keep getting issues where there are non standard Alphanumeric chars such as the following "🏄".

I would like to take out all non [abcdefghijklmnopqrstuvwxyz0123456789] characters using gsub.

Can you use gsub to specify a replace for those items NOT in [abcdefghijklmnopqrstuvwxyz0123456789]?

like image 652
Jon Yates Avatar asked Jan 13 '23 19:01

Jon Yates


1 Answers

You could simply negate you pattern with [^ ...]:

x <- "abcde🏄fgh"
gsub("[^A-Za-z0-9]", "", x)
# [1] "abcdefgh"

Please note that the class [:alnum:] matches all your given special characters. That's why gsub("[^[:alnum:]]", "", x) doesn't work.

like image 58
sgibb Avatar answered Jan 19 '23 01:01

sgibb