Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr/Lucene fieldCache OutOfMemory error sorting on dynamic field

We have a Solr core that has about 250 TrieIntFields (declared as dynamicField). There are about 14M docs in our Solr index and many documents have some value in many of these fields. We have a need to sort on all of these 250 fields over a period of time.

The issue we are facing is that the underlying lucene fieldCache gets filled up very quickly. We have a 4 GB box and the index size is 18 GB. After a sort on 40 or 45 of these dynamic fields, the memory consumption is about 90% and we start getting OutOfMemory errors.

For now, we have a cron job running every minute restarting tomcat if the total memory consumed is more than 80%.

From what I have read, I understand that restricting the number of distinct values on sortable Solr fields will bring down the fieldCache space. The values in these sortable fields can be any integer from 0 to 33000 and quite widely distributed. We have a few scaling solutions in mind, but what is the best way to handle this whole issue?

UPDATE: We thought instead of sorting, if we did boosting it won't go to fieldCache. So instead of issuing a query like

select?q=name:alba&sort=relevance_11 desc

we tried

select?q={!boost relevance_11}name:alba

but unfortunately boosting also populates the field cache :(

like image 671
arun Avatar asked Nov 15 '12 07:11

arun


1 Answers

I think you have two options:

1) Add more memory.
2) Force Solr not to use the field cache by specifying facet.method=enum, as per documentation.

There's also a solr-user mailing list thread discussing the same problem.

Unless your index is huge, I'd go with option 1). RAM is cheap these days.

like image 57
mindas Avatar answered Nov 02 '22 22:11

mindas