Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stemming English words with Lucene

I'm processing some English texts in a Java application, and I need to stem them. For example, from the text "amenities/amenity" I need to get "amenit".

The function looks like:

String stemTerm(String term){    ... } 

I've found the Lucene Analyzer, but it looks way too complicated for what I need. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/PorterStemFilter.html

Is there a way to use it to stem words without building an Analyzer? I don't understand all the Analyzer business...

EDIT: I actually need a stemming + lemmatization. Can Lucene do this?

like image 787
Mulone Avatar asked Mar 22 '11 13:03

Mulone


People also ask

Does Lucene do Stemming?

They can be used for stemming and stop word removal . Its a simple and a effective means of stemming. Show activity on this post. Since the PorterStemmer is not public, we ca't call the stem function of PorterStemmer.

What is Lucene analyzer?

Overview. Lucene Analyzers are used to analyze text while indexing and searching documents. We mentioned analyzers briefly in our introductory tutorial. In this tutorial, we'll discuss commonly used Analyzers, how to construct our custom analyzer and how to assign different analyzers for different document fields.


1 Answers

SnowballAnalyzer is deprecated, you can use Lucene Porter Stemmer instead:

 PorterStemmer stem = new PorterStemmer();  stem.setCurrent(word);  stem.stem();  String result = stem.getCurrent(); 

Hope this help!

like image 186
arbc Avatar answered Sep 17 '22 18:09

arbc