Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query multiple collections with different fields in solr

Tags:

solr

Given the following (single core) query's:

http://localhost/solr/a/select?indent=true&q=*:*&rows=100&start=0&wt=json
http://localhost/solr/b/select?indent=true&q=*:*&rows=100&start=0&wt=json

The first query returns "numFound":40000" The second query returns "numFound":10000"

I tried putting these together by:

   http://localhost/solr/a/select?indent=true&shards=localhost/solr/a,localhost/solr/b&q=*:*&rows=100&start=0&wt=json

Now I get "numFound":50000". The only problem is "a" has more columns than "b". So the multiple collections request only returns the values of a.

Is it possible to query multiple collections with different fields? Or do they have to be the same? And how should I change my third url to get this result?

like image 774
Vincent Avatar asked Oct 11 '13 08:10

Vincent


People also ask

How do I query in Solr collection?

You can search for "solr" by loading the Admin UI Query tab, enter "solr" in the q param (replacing *:* , which matches all documents), and "Execute Query". See the Searching section below for more information. To index your own data, re-run the directory indexing command pointed to your own directory of documents.

How do I merge two collections in Solr?

Sometimes you may want to inner join data from one solr connection to another. There is a facility to perform this action using a join query in SOLR. The easiest way to perform the join is by linking a single attribute from one collection to another attribute in another collection.

What is Q in Solr query?

Solr provides Query (q parameter) and Filter Query (fq parameter) for searching. The query (q parameter), as the name suggests, is the main query used for searching. Example. q = title:james. Filter queries are used alongside query (q parameter) to limit results of queries using additional filters.


Video Answer


1 Answers

What you need is - what I call - a unification core. That schema itself will have no content, it is only used as a sort of wrapper to unify those fields you want to display from both cores. In there you will need

  • a schema.xml that wraps up all the fields that you want to have in your unified result
  • a query handler that combines the two different cores for you

An important restriction beforehand taken from the Solr Wiki page about DistributedSearch

Documents must have a unique key and the unique key must be stored (stored="true" in schema.xml) The unique key field must be unique across all shards. If docs with duplicate unique keys are encountered, Solr will make an attempt to return valid results, but the behavior may be non-deterministic.

As example, I have shard-1 with the fields id, title, description and shard-2 with the fields id, title, abstractText. So I have these schemas

schema of shard-1

<schema name="shard-1" version="1.5">

  <fields>
    <field name="id"
          type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
          type="text" indexed="true" stored="true" multiValued="false" />
    <field name="description"
          type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

schema of shard-2

<schema name="shard-2" version="1.5">

  <fields>
    <field name="id" 
      type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="abstractText" 
      type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

To unify these schemas I create a third schema that I call shard-unification, which contains all four fields.

<schema name="shard-unification" version="1.5">

  <fields>
    <field name="id" 
      type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="abstractText" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="description" 
      type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

Now I need to make use of this combined schema, so I create a query handler in the solrconfig.xml of the solr-unification core

<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
  <lst name="defaults">
    <str name="defType">edismax</str>
    <str name="q.alt">*:*</str>
    <str name="qf">id title description abstractText</str>
    <str name="fl">*,score</str>
    <str name="mm">100%</str>
  </lst>
</requestHandler>
<queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />

That's it. Now some index-data is required in shard-1 and shard-2. To query for a unified result, just query shard-unification with appropriate shards param.

http://localhost/solr/shard-unification/select?q=*:*&rows=100&start=0&wt=json&shards=localhost/solr/shard-1,localhost/solr/shard-2

This will return you a result like

{
  "responseHeader":{
    "status":0,
    "QTime":10},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":1,
        "title":"title 1",
        "description":"description 1",
        "score":1.0},
      {
        "id":2,
        "title":"title 2",
        "abstractText":"abstract 2",
        "score":1.0}]
  }}

Fetch the origin shard of a document

If you want to fetch the originating shard into each document, you just need to specify [shard] within fl. Either as parameter with the query or within the requesthandler's defaults, see below. The brackets are mandatory, they will also be in the resulting response.

<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
  <lst name="defaults">
    <str name="defType">edismax</str>
    <str name="q.alt">*:*</str>
    <str name="qf">id title description abstractText</str>
    <str name="fl">*,score,[shard]</str>
    <str name="mm">100%</str>
  </lst>
</requestHandler>
<queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />

Working Sample

If you want to see a running example, checkout my solrsample project on github and execute the ShardUnificationTest. I have also included the shard-fetching by now.

like image 134
cheffe Avatar answered Sep 17 '22 18:09

cheffe