Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

in R, use gsub to remove all punctuation except period

I am new to R so I hope you can help me.

I want to use gsub to remove all punctuation except for periods and minus signs so I can keep decimal points and negative symbols in my data.

Example

My data frame z has the following data:

     [,1] [,2]   
[1,] "1"  "6"    
[2,] "2@"  "7.235"
[3,] "3"  "8"    
[4,] "4"  "$9"   
[5,] "£5" "-10" 

I want to use gsub("[[:punct:]]", "", z) to remove the punctuation.

Current output

> gsub("[[:punct:]]", "", z)
     [,1] [,2]  
[1,] "1"  "6"   
[2,] "2"  "7235"
[3,] "3"  "8"   
[4,] "4"  "9"   
[5,] "5"  "10" 

I would like, however, to keep the "-" sign and the "." sign.

Desired output

 PSEUDO CODE:  
> gsub("[[:punct:]]", "", z, except(".", "-") )
         [,1] [,2]  
    [1,] "1"  "6"   
    [2,] "2"  "7.235"
    [3,] "3"  "8"   
    [4,] "4"  "9"   
    [5,] "5"  "-10" 

Any ideas how I can make some characters exempt from the gsub() function?

like image 346
Crayon Constantinople Avatar asked Feb 03 '14 17:02

Crayon Constantinople


People also ask

How do I remove punctuation from a Dataframe in R?

Using the [[:punct:]] regexp class will ensure you really do remove all punctuation. And it can be done entirely within R.

How do you remove punctuations from regular expressions?

You can use this: Regex. Replace("This is a test string, with lots of: punctuations; in it?!.", @"[^\w\s]", "");

How can I strip all punctuation from a string in JavaScript?

We can use the JavaScript string replace method with a regex that matches the patterns in a string that we want to replace. So we can use it to remove punctuation by matching the punctuation and replacing them all with empty strings.


1 Answers

You can put back some matches like this:

 sub("([.-])|[[:punct:]]", "\\1", as.matrix(z))
     X..1. X..2.  
[1,] "1"   "6"    
[2,] "2"   "7.235"
[3,] "3"   "8"    
[4,] "4"   "9"    
[5,] "5"   "-10"  

Here I am keeping the . and -.

And I guess , the next step is to coerce you result to a numeric matrix, SO here I combine the 2 steps like this:

matrix(as.numeric(sub("([.-])|[[:punct:]]", "\\1", as.matrix(z))),ncol=2)
   [,1]    [,2]
[1,]    1   6.000
[2,]    2   7.235
[3,]    3   8.000
[4,]    4   9.000
[5,]    5 -10.000
like image 101
agstudy Avatar answered Oct 25 '22 20:10

agstudy