Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr vs document encoding problems

I am using solrj 1.4. My solrj doesn't index properly the documents in utf-16 encoding. I guess when it tries to convert to unicode, it replaces the problematic utf-16 surrogate keys with unicode replaceable character U+FFFD. Can anyone guide me on how to configure solrj 1.4 to index/search for utf-16 documents as well as utf-8 ?

like image 632
user911084 Avatar asked Nov 04 '22 15:11

user911084


1 Answers

The Solr index is in utf-8 (Why don't International Characters Work). In order to be able to search using other encodings you can always perform the translation in your software interfacing Solr.

like image 178
Johan Sjöberg Avatar answered Nov 12 '22 18:11

Johan Sjöberg