Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing odd characters in R with gsub

Tags:

r

gsub

I am currently in the process of doing some text analysis. I want to keep only alphanumeric characters but for some reason I am having trouble removing some pesky characters that I don't consider alphanumeric. Here's an example of what I am dealing with:

letters <- "ՄĄՄdasdas"
letters <- gsub("[^[:alnum:]]", "",letters)   
letters

> "ՄĄՄdasdas"

What am I doing wrong here?

like image 352
theamateurdataanalyst Avatar asked Jan 10 '23 14:01

theamateurdataanalyst


2 Answers

@konvas shows you how to use gsub correctly in this situation. The problem with your attempt is that those non-ASCII characters are considered alphabetic characters in your locale. Another option is to use iconv:

iconv(letters, to='ASCII', sub='')
like image 107
Matthew Plourde Avatar answered Jan 12 '23 08:01

Matthew Plourde


Try gsub("[^A-Za-z0-9]", "", letters)

like image 41
konvas Avatar answered Jan 12 '23 08:01

konvas