I use Lucene.net for indexing content & documents etc.. on websites. The index is very simple and has this format:
LuceneId - unique id for Lucene (TypeId + ItemId) TypeId - the type of text (eg. page content, product, public doc etc..) ItemId - the web page id, document id etc.. Text - the text indexed Title - web page title, document name etc.. to display with the search results
I've got these options to adapt it to serve multi-lingual content:
Which is the best option - or is there another? I've not used multiple indexes before so I'm leaning toward the second.
I do [2], but one problem I have is that I cannot use different analyzers depending on the language. I've combined the stopwords of the languages I want, but I lose the capability of more advanced stuff that the analyzer will offer such as stemming etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With