Is there a simple way I could use any subclass of Lucene's <code>Analyzer</code> to parse/tokenize a <code>String</code>? Something like: <pre class="prettyprint"><code>String to_be_parsed = "car window seven"; Analyzer analyzer = new StandardAnalyzer(...); List<String> tokenized_string = analyzer.analyze(to_be_parsed); </code></pre>

Based off of the answer above, this is slightly modified to work with Lucene 4.0. <pre class="prettyprint"><code>public final class LuceneUtil { private LuceneUtil() {} public static List<String> tokenizeString(Analyzer analyzer, String string) { List<String> result = new ArrayList<String>(); try { TokenStream stream = analyzer.tokenStream(null, new StringReader(string)); stream.reset(); while (stream.incrementToken()) { result.add(stream.getAttribute(CharTermAttribute.class).toString()); } } catch (IOException e) { // not thrown b/c we're using a string reader... throw new RuntimeException(e); } return result; } } </code></pre>

As far as I know, you have to write the loop yourself. Something like this (taken straight from my source tree): <pre class="prettyprint"><code>public final class LuceneUtils { public static List<String> parseKeywords(Analyzer analyzer, String field, String keywords) { List<String> result = new ArrayList<String>(); TokenStream stream = analyzer.tokenStream(field, new StringReader(keywords)); try { while(stream.incrementToken()) { result.add(stream.getAttribute(TermAttribute.class).term()); } } catch(IOException e) { // not thrown b/c we're using a string reader... } return result; } } </code></pre>

How to use a Lucene Analyzer to tokenize a String?

Tags:

java

lucene

tokenize

analyzer

Is there a simple way I could use any subclass of Lucene's Analyzer to parse/tokenize a String?

Something like:

String to_be_parsed = "car window seven"; Analyzer analyzer = new StandardAnalyzer(...); List<String> tokenized_string = analyzer.analyze(to_be_parsed);

835

asked Jun 13 '11 18:06

Felipe Hummel

2 Answers

Based off of the answer above, this is slightly modified to work with Lucene 4.0.

public final class LuceneUtil {    private LuceneUtil() {}    public static List<String> tokenizeString(Analyzer analyzer, String string) {     List<String> result = new ArrayList<String>();     try {       TokenStream stream  = analyzer.tokenStream(null, new StringReader(string));       stream.reset();       while (stream.incrementToken()) {         result.add(stream.getAttribute(CharTermAttribute.class).toString());       }     } catch (IOException e) {       // not thrown b/c we're using a string reader...       throw new RuntimeException(e);     }     return result;   }  }

198

answered Sep 28 '22 11:09

Ben McCann

As far as I know, you have to write the loop yourself. Something like this (taken straight from my source tree):

public final class LuceneUtils {      public static List<String> parseKeywords(Analyzer analyzer, String field, String keywords) {          List<String> result = new ArrayList<String>();         TokenStream stream  = analyzer.tokenStream(field, new StringReader(keywords));          try {             while(stream.incrementToken()) {                 result.add(stream.getAttribute(TermAttribute.class).term());             }         }         catch(IOException e) {             // not thrown b/c we're using a string reader...         }          return result;     }   }

answered Sep 28 '22 13:09

stevevls

Related questions
                            
                                Catching an exception that is nested into another exception
                            
                                Reusing views in Android Listview with 2 different layouts
                            
                                Spring MVC - Multiple submit button to a Form
                            
                                Create a BufferedImage from file and make it TYPE_INT_ARGB
                            
                                Spring MVC Controller: Redirect without parameters being added to my url
                            
                                How to peek on an Optional?
                            
                                How do I in JDBC read a possibly null double value from resultSet?
                            
                                Eclipse exported Runnable JAR not showing images
                            
                                How can I run a Spring Boot application on port 80
                            
                                How does integer type cast behave in Java for numbers beyond the range of integers?
                            
                                Where does gradle save dependencies' jars?
                            
                                Method chaining + inheritance don’t play well together?
                            
                                Multiple values in java.util.Properties
                            
                                What is a JPA implementation?
                            
                                Why are composite keys discouraged in hibernate?
                            
                                Logging request and response in one place with JAX-RS
                            
                                illegal forward reference in java
                            
                                How to use scala.None from Java code [duplicate]
                            
                                Jackson read value as string
                            
                                NoClassDefFoundError when using Powermock

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With