I'm processing some English texts in a Java application, and I need to stem them. For example, from the text "amenities/amenity" I need to get "amenit".
The function looks like:
String stemTerm(String term){ ... }
I've found the Lucene Analyzer, but it looks way too complicated for what I need. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/PorterStemFilter.html
Is there a way to use it to stem words without building an Analyzer? I don't understand all the Analyzer business...
EDIT: I actually need a stemming + lemmatization. Can Lucene do this?
They can be used for stemming and stop word removal . Its a simple and a effective means of stemming. Show activity on this post. Since the PorterStemmer is not public, we ca't call the stem function of PorterStemmer.
Overview. Lucene Analyzers are used to analyze text while indexing and searching documents. We mentioned analyzers briefly in our introductory tutorial. In this tutorial, we'll discuss commonly used Analyzers, how to construct our custom analyzer and how to assign different analyzers for different document fields.
SnowballAnalyzer is deprecated, you can use Lucene Porter Stemmer instead:
PorterStemmer stem = new PorterStemmer(); stem.setCurrent(word); stem.stem(); String result = stem.getCurrent();
Hope this help!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With