Query multiple collections with different fields in solr

Tags:

solr

Given the following (single core) query's:

http://localhost/solr/a/select?indent=true&q=*:*&rows=100&start=0&wt=json
http://localhost/solr/b/select?indent=true&q=*:*&rows=100&start=0&wt=json

The first query returns "numFound":40000" The second query returns "numFound":10000"

I tried putting these together by:

   http://localhost/solr/a/select?indent=true&shards=localhost/solr/a,localhost/solr/b&q=*:*&rows=100&start=0&wt=json

Now I get "numFound":50000". The only problem is "a" has more columns than "b". So the multiple collections request only returns the values of a.

Is it possible to query multiple collections with different fields? Or do they have to be the same? And how should I change my third url to get this result?

774

asked Oct 11 '13 08:10

Video Answer

1 Answers

What you need is - what I call - a unification core. That schema itself will have no content, it is only used as a sort of wrapper to unify those fields you want to display from both cores. In there you will need

a schema.xml that wraps up all the fields that you want to have in your unified result
a query handler that combines the two different cores for you

An important restriction beforehand taken from the Solr Wiki page about DistributedSearch

Documents must have a unique key and the unique key must be stored (stored="true" in schema.xml) The unique key field must be unique across all shards. If docs with duplicate unique keys are encountered, Solr will make an attempt to return valid results, but the behavior may be non-deterministic.

As example, I have shard-1 with the fields id, title, description and shard-2 with the fields id, title, abstractText. So I have these schemas

schema of shard-1

<schema name="shard-1" version="1.5">

  <fields>
    <field name="id"
          type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
          type="text" indexed="true" stored="true" multiValued="false" />
    <field name="description"
          type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

schema of shard-2

<schema name="shard-2" version="1.5">

  <fields>
    <field name="id" 
      type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="abstractText" 
      type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

To unify these schemas I create a third schema that I call shard-unification, which contains all four fields.

<schema name="shard-unification" version="1.5">

  <fields>
    <field name="id" 
      type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="abstractText" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="description" 
      type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

Now I need to make use of this combined schema, so I create a query handler in the solrconfig.xml of the solr-unification core

<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
  <lst name="defaults">
    <str name="defType">edismax</str>
    <str name="q.alt">*:*</str>
    <str name="qf">id title description abstractText</str>
    <str name="fl">*,score</str>
    <str name="mm">100%</str>
  </lst>
</requestHandler>
<queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />

That's it. Now some index-data is required in shard-1 and shard-2. To query for a unified result, just query shard-unification with appropriate shards param.

http://localhost/solr/shard-unification/select?q=*:*&rows=100&start=0&wt=json&shards=localhost/solr/shard-1,localhost/solr/shard-2

This will return you a result like

{
  "responseHeader":{
    "status":0,
    "QTime":10},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":1,
        "title":"title 1",
        "description":"description 1",
        "score":1.0},
      {
        "id":2,
        "title":"title 2",
        "abstractText":"abstract 2",
        "score":1.0}]
  }}

Fetch the origin shard of a document

If you want to fetch the originating shard into each document, you just need to specify [shard] within fl. Either as parameter with the query or within the requesthandler's defaults, see below. The brackets are mandatory, they will also be in the resulting response.

<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
  <lst name="defaults">
    <str name="defType">edismax</str>
    <str name="q.alt">*:*</str>
    <str name="qf">id title description abstractText</str>
    <str name="fl">*,score,[shard]</str>
    <str name="mm">100%</str>
  </lst>
</requestHandler>
<queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />

Working Sample

If you want to see a running example, checkout my solrsample project on github and execute the ShardUnificationTest. I have also included the shard-fetching by now.

134

answered Sep 17 '22 18:09

cheffe

Related questions
                            
                                apache solr as a service hosting [closed]
                            
                                How can I Schedule data imports in Solr
                            
                                Forward Index vs Inverted index Why?
                            
                                How can I upload a file to Solr in Windows?
                            
                                SOLR - Best approach to import 20 million documents from csv file
                            
                                Can't reindex Sunspot SOLR - Error - RSolr::Error::Http - 500 Internal Server Error
                            
                                Relevance feedback in Apache Solr
                            
                                How do I create a solr core with the data from an existing one?
                            
                                Solr in a multi-tenant environment
                            
                                How to make sure Solr/Lucene won't die with java.lang.OutOfMemoryError?
                            
                                What is the _root_ field in schema.xml?
                            
                                How to fix java OutOfMemoryError: Java heap space from DataImportHandler?
                            
                                How can I get rid of "Can not find: admin-extra.html" error
                            
                                Recommendation Systems using Solr and Mahout [closed]
                            
                                Boost Solr results based on the field that contained the hit
                            
                                Best way to keep index real time?
                            
                                Logging Search Keywords in Solr / Lucene
                            
                                Lucene as data store
                            
                                Limit the amount of results returned by a filter query in Solr
                            
                                Reload Solr core with curl

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With