Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr/Solrj pagination

I am using solr and solrj for index and search functionality in a web app I am creating. My request handler is configured in solrconfig.xml as follows:

<requestHandler name="/select" class="solr.SearchHandler">
 <lst name="defaults">
   <str name="echoParams">explicit</str>
   <str name="start">0</str>
   <int name="rows">10</int>
   <str name="defType">edismax</str>
   <str name="qf">
      title^10.0 subtitle^7.0 abstract^5.0 content^1.0 text^1.0
   </str>
   <str name="pf">
      title^10.0 subtitle^7.0 abstract^5.0 content^1.0 text^1.0
   </str>
   <str name="df">text</str>

 </lst>
</requestHandler>

As it stands, the indexing and searching works well. However, I want to implement pagination. The config file contains "start" and "row" data. However, in solrj, when I run:

SolrQuery query = new SolrQuery(searchTerm);
System.out.println(query.getRequestHandler());
System.out.println(query.getRows());
System.out.println(query.getStart());

The three print statements each show null. I understand each of those 'gets' has a correspond 'set', but I would have imagined that they would be already set via the response handler in the solrconfig.xml. Can someone clue me in?

like image 962
Mike Nitchie Avatar asked Jun 07 '13 20:06

Mike Nitchie


People also ask

What is cursor pagination?

Cursor-based pagination works by returning a pointer to a specific item in the dataset. On subsequent requests, the server returns results after the given pointer.

What is the difference between Q and FQ in solr?

Standard solr queries use the "q" parameter in a request. Filter queries use the "fq" parameter. The primary difference is that filtered queries do not affect relevance scores; the query functions purely as a filter (docset intersection, essentially).

How do you query in solr?

Trying a basic queryThe main query for a solr search is specified via the q parameter. Standard Solr query syntax is the default (registered as the “lucene” query parser). If this is new to you, please check out the Solr Tutorial. Adding debug=query to your request will allow you to see how Solr is parsing your query.


2 Answers

Before executing the query on the server, the client would not know about what you have set on the server side, right? So it is not a surprise that they are all null.

To implement pagination you need two parameters from the client side - the page number and the number of items per page. Once you got these two, you can construct your SolrQuery on the client side as follows:

SolrQuery query = new SolrQuery(searchTerm);
query.setStart((pageNum - 1) * numItemsPerPage);
query.setRows(numItemsPerPage);
// execute the query on the server and get results
QueryResponse res = solrServer.query(solrQuery);
like image 64
arun Avatar answered Sep 25 '22 20:09

arun


As @arun stated in his answer, "the client would not know about what you have set on the server side". So don't be surprise that they are empty. On the other hand I would warn you about pagination problems that can arise in some situations.

Pagination is a simple thing when you have few documents to read and all you have to do is play with start and rows parameters.

So for a client that wants 50 results per page, page #1 is requested using start=0&rows=50. Page #2 is start=50&rows=50, page #3 is start=100&rows=50, etc…. But in order for Solr to know which 50 docs to return starting at an arbitrary point N, it needs to build up an internal queue of the first N+50 sorted documents matching the query, so that it can then throw away the first N docs, and return the remaining 50. This means the amount of memory needed to return paginated results grows linearly with the increase in the start param.

So in case you have many documents, I mean hundreds of thousands or even millions this is not a feasible way.
This is the kind of thing that could bring your solr server to their knees.

For typical applications displaying search results to a human user, this tends to not be much of an issue since most users don’t care about drilling down past the first handful of pages of search results — but for automated systems that want to crunch data about all of the documents matching a query, it can be seriously prohibitive.

This means that if you have a website and are paging search results, a real user do not go so further but consider on the other hand what can happen if a spider or a scraper try to read all the website pages. Now we are talking of Deep Paging.

I’ll suggest to read this amazing post:

https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

And take a look at this document page:

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results

And here is an example that try to explain how to paginate using the cursors.

SolrQuery solrQuery = new SolrQuery();
solrQuery.setRows(500);
solrQuery.setQuery("*:*");
solrQuery.addSort("id", ORDER.asc);  // Pay attention to this line
String cursorMark = CursorMarkParams.CURSOR_MARK_START;
boolean done = false;
while (!done) {
    solrQuery.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
    QueryResponse rsp = solrClient.query(solrQuery);
    String nextCursorMark = rsp.getNextCursorMark();
    for (SolrDocument d : rsp.getResults()) {
            ... 
    }
    if (cursorMark.equals(nextCursorMark)) {
        done = true;
    }
    cursorMark = nextCursorMark;
}
like image 30
freedev Avatar answered Sep 26 '22 20:09

freedev