Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing punctuations from text using R

Tags:

r

I need to remove punctuation from the text. I am using tm package but the catch is :

eg: the text is something like this:

data <- "I am a, new comer","to r,"please help","me:out","here"

now when I run

library(tm)
data<-removePunctuation(data)

in my code, the result is :

I am a new comerto rplease helpmeouthere 

but what I expect is:

I am a new comer to r please help me out here
like image 593
SHRUTAYU Kale Avatar asked Mar 17 '15 12:03

SHRUTAYU Kale


People also ask

How do I remove punctuation from a DataFrame in R?

Using the [[:punct:]] regexp class will ensure you really do remove all punctuation. And it can be done entirely within R.

How do you clean punctuation from a string?

Method 1: Remove Punctuation from a String with Translate translate method is empty strings, and the third input is a Python list of the punctuation that should be removed. This instructs the Python method to eliminate punctuation from a string. This is one of the best ways to strip punctuation from a string.

Which R function from TM package will remove the commas from the text?

removePunctuation: Remove Punctuation Marks from a Text Document in tm: Text Mining Package.


2 Answers

Here's how I take your question, and an answer that is very close to @David Arenburg's in the comment above.

 data <- '"I am a, new comer","to r,"please help","me:out","here"'
 gsub('[[:punct:] ]+',' ',data)
 [1] " I am a new comer to r please help me out here "

The extra space after [:punct:] is to add spaces to the string and the + matches one or more sequential items in the regular expression. This has the side effect, desirable in some cases, of shortening any sequence of spaces to a single space.

like image 127
PeterK Avatar answered Oct 12 '22 01:10

PeterK


If you had something like

string <- "hello,you"
> string
[1] "hello,you"

You could do this:

> gsub(",", "", string)
[1] "helloyou"

It replaces the "," with "" in the variable called string

like image 33
Dominic Avatar answered Oct 12 '22 02:10

Dominic