Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove special characters from data frame

I have a matrix that contains the string "Energy per �m". Before the 'm' is a diamond shaped symbol with a question mark in it - I don't know what it is.

I have tried to get rid of it by using this on the column of the matrix:

a=gsub('Energy per �m','',a) 

[and using copy/paste for the first term of gsub], but it does not work.[unexpected symbol in "a=rep(5,Energy per"]. When I try to extract something from the original matrix with grepl I get:

46: In grepl("ref. value", raw$parameter) :
input string 15318 is invalid in this locale

How can I get rid of all this sort of signs? I would like to have only 0-9, A-Z, a-z, / and '. The rest can be zapped.

like image 827
Henk Avatar asked Aug 15 '12 14:08

Henk


People also ask

How do I remove special characters from a dataset in Python?

Remove Special Characters Including Strings Using Python isalnum. Python has a special string method, . isalnum() , which returns True if the string is an alpha-numeric character, and returns False if it is not. We can use this, to loop over a string and append, to a new string, only alpha-numeric characters.

How do I remove all special characters from a string in Python?

Remove Special Characters From the String in Python Using the str. isalnum() Method. The str. isalnum() method returns True if the characters are alphanumeric characters, meaning no special characters in the string.


Video Answer


1 Answers

There is probably a better way to do this than with regex (e.g. by changing the Encoding).

But here is your regex solution:

gsub("[^0-9A-Za-z///' ]", "", a)
[1] "Energy per m"

But, as pointed out by @JoshuaUlrich, you're better off to use:

gsub("[^[:alnum:]///' ]", "", x)
[1] "Energy per m"
like image 77
Andrie Avatar answered Oct 16 '22 02:10

Andrie