Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting facet count 0 in solr

I am using solr search with faceting in my application. My use case is in such a way that the index files in the datadir keeps on changing.

The problem is, when I facet based on a particular field. I get the value from the indices that where previously in the data dir (and are not present currently). However they are returned with a value of 0. I don't understand where the values from the previous indices are persisted and are returned during a totally newer search?

Though I can simply skip the facets with count 0, I understand that this can seriously eat over my scalability. Any pointers to not include the facets from previous searchers?

[Edit 1] : The current workaround I am using is add a facet.mincount=1 in my URL. But still, I guess this can eat over my performance.

like image 696
Greenhorn Avatar asked Apr 09 '12 06:04

Greenhorn


People also ask

How do you facet in SOLR?

Open the web UI of Apache Solr and on the left-hand side of the page, check the checkbox facet, as shown in the following screenshot. On checking the checkbox, you will have three more text fields in order to pass the parameters of the facet search. Now, as parameters of the query, pass the following values.

What does facet mean in SOLR?

Faceting is the arrangement of search results into categories based on indexed terms. Searchers are presented with the indexed terms, along with numerical counts of how many matching documents were found were each term.

What is facet pivot in SOLR?

Solr faceting is used in many applications to give an overall idea about how the data resides in the index. Solr pivot faceting or decision tree faceting or sub faceting used to provide more details view of index data.


2 Answers

I couldnt find a comment option & I dont have enough reputation to vote-up! I have the same exact problem. We are using atomic updates with solr 4.2.

I found some explanation here: http://collab.sakaiproject.org/pipermail/oae-dev/2011-November/000693.html

Excerpt:

To efficiently handle facets for multi-valued fields (like tags), Solr builds an "uninverted index" (which you think would just be called an "index", but I suppose that's even more confusing), which maps internal document IDs to the list of terms they contain. Calculating facets from this data structure just requires walking over every document in the result set, looking up the terms it contains in the uninverted index, and adding them to the tally for all documents.

However, there's a sneaky optimisation here that causes the zero counts we're seeing. For terms that appear in more than 5% of documents, Solr doesn't include them in the uninverted index (leaving them out helps to keep the size in memory down, I guess), and instead gets the count for these terms using a regular query against the Lucene index. Since the set of "common" terms isn't specific to your result set, and since any given result set won't necessarily contain all of these terms, you can get back counts of zero.

It may not be from old index values but just terms that exist in more than 5% of documents?

like image 156
user2023507 Avatar answered Oct 20 '22 18:10

user2023507


I think facet.mincount=n is not a workaround, you should use it to get only the non-negative facet count.

solrQuery.setQuery("*:*");
solrQuery.addFacetField("foobar");
solrQuery.setFacetMinCount(1);
like image 2
Kaidul Avatar answered Oct 20 '22 20:10

Kaidul