I have been coding text mining with R recently,but I have trouble dealing with data preprocessing. I have a string like this below:
"I want to buy 3D printer, but it costs 3000 dollars."
I want keep words "3D" but remove "3000", it should be like this below:
"I want to buy 3D printer, but it costs dollars."
I use corpus <- tm_map(corpus, removeNumbers)
but this will remove all the numbers in the text, so I will have the term "D printer" in the result but it should be "3D printer".
Is there any possible way to fix this probelm? Thanks!
We can use sub
gsub('3\\d+\\s', '', str1)
If this needs to be general,
gsub('\\b\\d+\\s', '', str1)
#[1] "I want to buy 3D printer, but it costs dollars."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With