Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SOLR - Grouping results with group.limit return wrong numFound

When I do a search with grouping result and perform group limit, I get that numFound is the same as I when I don’t use the limit.

It looks like SOLR first performs search and calculates numFound and then limit the results.

I can't use pagination and other stuff. Is there any workaround or I missed something ?


Example:

======================================
| id |  publisher | book_title      |
======================================
| 1  | A1         | Title Book      |
| 2  | A1         | Book title 123  |
| 3  | A1         | My book         |
| 4  | B2         | Hi book title   |
| 5  | B2         | Another Book    |

If I perform query:

q=book_title:book
&group=true 
&group.field=publisher 
&group.limit=1
&group.main=true 

I will get numFound 5 but only 2 in the results.

"response": {
    "numFound": 5,
    "docs": [
        {
            "book_title": "My book",
            "publisher":  "A1"
        },
        {
            "book_title": "Another Book",
            "publisher":  "B2"
        }
    ]
}
like image 623
tasmaniski Avatar asked Dec 25 '13 18:12

tasmaniski


People also ask

What is numFound in SOLR?

numFound indicates the number of documents in the search index that matched your query. Solr only returns the specified number of documents in results, though. Without setting parameters, defaults are used; everything is configurable, either through the query string or in query configuration (see solrconfig. xml).

Which feature we can use to group search results?

Solr has a result grouping feature. You use this feature to sort documents into groups, based on a common field value. Solr returns the top documents for each group.

What is search result grouping?

Search result grouping. The Solr search result grouping feature enables grouping parent catalog entries with their underlying SKUs at query time. Products can be chosen to represent the group that is returned in search results to provide visual relevancy.


2 Answers

Set group.ngroups to true. That will produce

"grouped": {
"bl_version_id": {
  "matches": 53,
  "ngroups": 18,
  "groups": [
    {
...
like image 51
Kanu Avatar answered Sep 22 '22 10:09

Kanu


I had the same problem, couldn't find a way to fix the root cause, but I will share my solution as a workaround.

What I did is

  1. Facet by the field I'm grouping on.
  2. Count the number of unique facets. This will match the number of unique documents (2 in your case)

Add these faceting parameters to your query:

&facet=true
&facet.limit=-1
&facet.field=publisher

Notes:

  • This is a bit expensive, but it's the only way that worked for me (so far).
  • This will only work if publisher is not multi-valued
like image 24
mjalajel Avatar answered Sep 24 '22 10:09

mjalajel