I am searching raw twitter snippets using R but keep getting issues where there are non standard Alphanumeric chars such as the following "🏄"
.
I would like to take out all non [abcdefghijklmnopqrstuvwxyz0123456789]
characters using gsub
.
Can you use gsub
to specify a replace for those items NOT in [abcdefghijklmnopqrstuvwxyz0123456789]
?
You could simply negate you pattern with [^ ...]
:
x <- "abcde🏄fgh"
gsub("[^A-Za-z0-9]", "", x)
# [1] "abcdefgh"
Please note that the class [:alnum:]
matches all your given special characters. That's why gsub("[^[:alnum:]]", "", x)
doesn't work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With