We have a Solr core that has about 250 TrieIntField
s (declared as dynamicField
). There are about 14M docs in our Solr index and many documents have some value in many of these fields. We have a need to sort on all of these 250 fields over a period of time.
The issue we are facing is that the underlying lucene fieldCache
gets filled up very quickly. We have a 4 GB box and the index size is 18 GB. After a sort on 40 or 45 of these dynamic fields, the memory consumption is about 90% and we start getting OutOfMemory errors.
For now, we have a cron job running every minute restarting tomcat if the total memory consumed is more than 80%.
From what I have read, I understand that restricting the number of distinct values on sortable Solr fields will bring down the fieldCache
space. The values in these sortable fields can be any integer from 0 to 33000 and quite widely distributed. We have a few scaling solutions in mind, but what is the best way to handle this whole issue?
UPDATE: We thought instead of sorting, if we did boosting it won't go to fieldCache. So instead of issuing a query like
select?q=name:alba&sort=relevance_11 desc
we tried
select?q={!boost relevance_11}name:alba
but unfortunately boosting also populates the field cache :(
I think you have two options:
1) Add more memory.
2) Force Solr not to use the field cache by specifying facet.method=enum
, as per documentation.
There's also a solr-user mailing list thread discussing the same problem.
Unless your index is huge, I'd go with option 1). RAM is cheap these days.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With