Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do Java String matching using Boolean Search Syntax?

I'm looking for a Java/Scala library that can take an user query and a text and returns if there was a matching or not.

I'm processing a stream of information, ie: Twitter Stream, and can't afford to use a batching process, I need to evaluate each tweet in realtime, instead of index it through Lucene RAMDisk and querying it later.

It's possible create a parser/lexer using ANTLR but this is such common usage that I can't believe nobody create a lib before.

Some samples from TextQuery Ruby library that does exactly what I need:

    TextQuery.new("'to be' OR NOT 'to_be'").match?("to be")   # => true

    TextQuery.new("-test").match?("some string of text")      # => true
    TextQuery.new("NOT test").match?("some string of text")   # => true

    TextQuery.new("a AND b").match?("b a")                    # => true
    TextQuery.new("a AND b").match?("a c")                    # => false

    q = TextQuery.new("a AND (b AND NOT (c OR d))")
    q.match?("d a b")                                         # => false
    q.match?("b")                                             # => false
    q.match?("a b cdefg")                                     # => true

    TextQuery.new("a~").match?("adf")                         # => true
    TextQuery.new("~a").match?("dfa")                         # => true
    TextQuery.new("~a~").match?("daf")                        # => true
    TextQuery.new("2~a~1").match?("edaf")                     # => true
    TextQuery.new("2~a~2").match?("edaf")                     # => false

    TextQuery.new("a", :ignorecase => true).match?("A b cD")  # => true

Once it was implemented in Ruby it's not suitable for my platform, also I can't use JRuby just for this point on our solution:

I found a similar question but couldn't get answer from it: Boolean Query / Expression to a Concrete syntax tree

Thanks!

like image 928
arjones Avatar asked Apr 07 '12 15:04

arjones


1 Answers

Given that you are doing text search, I would try to leverage some of the infrastructure provided by Lucene. May be you could create a QueryParser and call parse to get back a Query. Instantiable subclasses of Query are:

TermQuery
MultiTermQuery
BooleanQuery
WildcardQuery
PhraseQuery
PrefixQuery
MultiPhraseQuery
FuzzyQuery
TermRangeQuery
NumericRangeQuery
SpanQuery

Then you may be able to use pattern matching to implement what a match means for your application:

def match_?(tweet: String, query: Query): Boolean = query match {
  case q: TermQuery => tweet.contains(q.getTerm.text)
  case q: BooleanQuery => 
    // return true if all must clauses are satisfied
    // call match_? recursively
  // you need to cover all subclasses above
  case _ => false
}

val q = queryParser.parse(userQuery)
val res = match_?(tweet, q)

Here is an implementation. It surely has bugs but you'll get the idea and it shows a working proof of concept. It re-uses the syntax, documentation and grammer of the default Lucene QueryParser.

like image 100
huynhjl Avatar answered Oct 27 '22 18:10

huynhjl