Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Solr, why use different analyzers for index vs query?

Tags:

solr

lucene

Is there a substantial reason for why you'd want to use a different analyzer for indexing vs querying? In the example schema.xml file, for text_en_splitting, for example, the index analyzer doesn't do synonym expansion, but the query one does. Is that just to keep the index as small as possible? Similarly, for WordDelimiterFilterFactory, the index analyzer has catenateWords="1" and catenateNumbers="1", while the query analyzer has them set to 0. Is that just to keep the query small (fast)? Are these optimizations really worth the maintenance nightmare of two analyzers that are "nearly identical"?

Thanks!

like image 347
Chung Wu Avatar asked Feb 21 '23 19:02

Chung Wu


1 Answers

You don't need synonym expansion at index time and query time, only one of those two. Think about it, if you only do it while indexing, all listed words will be supplemented by all its synonyms.
Then, when you query the index with any of those words, you'll match all docs that underwent expansion.

There's no need to expand at both ends. And it's suggested that you do it at index time, since that way you speed up your query time.

IMHO, the general rule should be to chip everywhere you can (including couple of milliseconds from expanding synonyms at query time) to make user experience that much better. And these chips can pile up substantially.

You can ask the same question about why we encourage data redundancy in documents.

like image 68
Marko Bonaci Avatar answered Feb 23 '23 08:02

Marko Bonaci