We are planning to deploy Solr for searching multiple sites published from common CMS platform.
There will be separate sites per language where other languages will mostly have content translated from English.
The search requirements include – keyword highlighting, suggestions (“did you mean?”), stopwords, faceting.
We are evaluating using single core vs per-language multi-core Solr option. What is the recommended approach here?
In Solr, the term core is used to refer to a single index and associated transaction log and configuration files (including the solrconfig. xml and Schema files, among others).
instanceDir -- The core's instance directory (i.e. the directory under which that core's conf/ and data/ directory are located) solr. core. dataDir -- The core's data directory (i.e. the directory under which that core's index directory are located)
The solrconfig. xml file is the configuration file with the most parameters affecting Solr itself. While configuring Solr, you'll work with solrconfig. xml often, either directly or via the Config API to create "configuration overlays" ( configoverlay. json ) to override the values in solrconfig.
You need multicore because you cannot do stemming and stopwords on a multilingual database.
Common stopwords in English are "by" and "is" but these words mean "town" and "ice" in many Nordic languages.
If you do multicore, each language can be on its own core with a customized schema.xml that selects the right stemmer, stopwords and protected words. But the same JVM is running it all on the same server, so you are not spending any extra money for servers for one specific language. Then, if the load is too great for one server, you replicate your multicore setup and all of the indexes benefit from the replicas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With