Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-Term Wildcard queries in Lucene?

I'm using Zend_Search_Lucene, the PHP port of Java Lucene. I currently have some code that will build a search query based on an array of strings, finding results for which at least one index field matches each of the strings submitted. Simplified, it looks like this:

(Note: $words is an array constructed from user input.)

$query = new Zend_Search_Lucene_Search_Query_Boolean();
foreach ($words as $word) {
  $term1 = new Zend_Search_Lucene_Index_Term($word, $fieldname1);
  $term2 = new Zend_Search_Lucene_Index_term($word, $fieldname2);
  $multiq = new Zend_Search_Lucene_Search_Query_MultiTerm();
  $multiq->addTerm($term1);
  $multiq->addTerm($term2);
  $query->addSubquery($multiq, true);
}
$hits = $index->find($query);

What I would like to do is replace $word with ($word . '*') — appending an asterisk to the end of each word, turning it into a wildcard term.

But then, $multiq would have to be a Zend_Search_Lucene_Search_Query_Wildcard instead of a Zend_Search_Lucene_Search_Query_MultiTerm, and I don't think I would still be able to add multiple Index_Terms to each $multiq.

Is there a way to construct a query that's both a Wildcard and a MultiTerm?

Thanks!

like image 450
sherlock42 Avatar asked Jul 02 '09 14:07

sherlock42


1 Answers

Not in the way you're hoping to achieve it, unfortunately:

Lucene supports single and multiple character wildcard searches within single terms (but not within phrase queries).

and even if it were possible, would probably not be a good idea:

Wildcard, range and fuzzy search queries may match too many terms. It may cause incredible search performance downgrade.

I imagine the way to go if you insist on multiple wildcard terms, would be two execute two separate searches, one for each wildcarded term, and bundle the results together.

like image 179
karim79 Avatar answered Oct 19 '22 12:10

karim79