I am new to R so I hope you can help me.
I want to use gsub to remove all punctuation except for periods and minus signs so I can keep decimal points and negative symbols in my data.
Example
My data frame z has the following data:
[,1] [,2]
[1,] "1" "6"
[2,] "2@" "7.235"
[3,] "3" "8"
[4,] "4" "$9"
[5,] "£5" "-10"
I want to use gsub("[[:punct:]]", "", z)
to remove the punctuation.
Current output
> gsub("[[:punct:]]", "", z)
[,1] [,2]
[1,] "1" "6"
[2,] "2" "7235"
[3,] "3" "8"
[4,] "4" "9"
[5,] "5" "10"
I would like, however, to keep the "-" sign and the "." sign.
Desired output
PSEUDO CODE:
> gsub("[[:punct:]]", "", z, except(".", "-") )
[,1] [,2]
[1,] "1" "6"
[2,] "2" "7.235"
[3,] "3" "8"
[4,] "4" "9"
[5,] "5" "-10"
Any ideas how I can make some characters exempt from the gsub() function?
Using the [[:punct:]] regexp class will ensure you really do remove all punctuation. And it can be done entirely within R.
You can use this: Regex. Replace("This is a test string, with lots of: punctuations; in it?!.", @"[^\w\s]", "");
We can use the JavaScript string replace method with a regex that matches the patterns in a string that we want to replace. So we can use it to remove punctuation by matching the punctuation and replacing them all with empty strings.
You can put back some matches like this:
sub("([.-])|[[:punct:]]", "\\1", as.matrix(z))
X..1. X..2.
[1,] "1" "6"
[2,] "2" "7.235"
[3,] "3" "8"
[4,] "4" "9"
[5,] "5" "-10"
Here I am keeping the .
and -
.
And I guess , the next step is to coerce you result to a numeric matrix, SO here I combine the 2 steps like this:
matrix(as.numeric(sub("([.-])|[[:punct:]]", "\\1", as.matrix(z))),ncol=2)
[,1] [,2]
[1,] 1 6.000
[2,] 2 7.235
[3,] 3 8.000
[4,] 4 9.000
[5,] 5 -10.000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With