Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Base word stemming instead of root word stemming in R

Tags:

r

nlp

stemming

Is there any way to get base word instead of root word in stemming using NLP in R?

Code:

> #Loading libraries
> library(tm)
> library(slam)
> 
> #Vector
> Vec=c("happyness happies happys","sky skies")
> 
> #Creating Corpus
> Txt=Corpus(VectorSource(Vec))
> 
> #Stemming
> Txt=tm_map(Txt, stemDocument)
> 
> #Checking result
> inspect(Txt)
A corpus with 2 text documents

The metadata consists of 2 tag-value pairs and a data frame
Available tags are:
  create_date creator 
Available variables in the data frame are:
  MetaID 

[[1]]
happi happi happi

[[2]]
sky sky

> 

Can I get base word "happy" (base word) instead of "happi" (root word) for "happyness happies happys" using R.

like image 406
AVSuresh Avatar asked Jul 12 '11 13:07

AVSuresh


1 Answers

You're probably looking for a stemmer. Here are some stemmers from CRAN Task View: Natural Language Processing:

  • RWeka is a interface to Weka which is a collection of machine learning algorithms for data mining tasks written in Java. Especially useful in the context of natural language processing is its functionality for tokenization and stemming.

  • Snowball provides the Snowball stemmers which contain the Porter stemmer and several other stemmers for different languages. See the Snowball webpage for details.

  • Rstem is an alternative interface to a C version of Porter's word stemming algorithm.

like image 185
cyborg Avatar answered Oct 11 '22 07:10

cyborg