Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr multilingual search

I'm currently working on a project where we have indexed text content in SOLR. Every content is writen in one specific language (we have 4 differents european languages) but we would like to add a feature that if the primary search (search text entered by the user) doesn't return much result then we try too look for document in other languages. Thus we would somehow need to translate the query. Our base is that we can have a mapping list of translated words commonly used in the field of the project.

One solution that came to me was to use synonym search feature. But this might not provide the best results.

Does people have pointers on existing modules that could help us achieving this multilingual search feature? Or conception ideas we cold try to investigate?

Thanks

like image 546
benjamin.donze Avatar asked Nov 19 '17 12:11

benjamin.donze


2 Answers

It seems like multi-lingual search is not a unique problem.

Please take a look http://lucene.472066.n3.nabble.com/Multilingual-Search-td484201.html and Solr index and search multilingual data

those two links suggest to have dedicated fields for each language, but you can also have a field that states language, and you can add filter query (&fq=) for the language you have detected (from user query). This is more scalable solution, I think.

like image 161
Harry Yoo Avatar answered Nov 09 '22 21:11

Harry Yoo


One option would be for you to translate your terms at index time, this could probably be done at Solr level or even before Solr at the application level, and then store the translated texts in different fields so you would have fields like:

text_en: "Hello",
text_fi: "Hei"

Then you can just query text_en:Hello and it would match.

And if you want to score primary language matches higher, you could have a primary_language field and then boost documents where it matches the search language higher.

like image 1
Mico Avatar answered Nov 09 '22 21:11

Mico