Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

solr schema design for many to many entity definitions

Tags:

solr

I am trying to design a schema for a scenarios where there is a Many to Many relation between Products and Supplier. Search can be done from product centric way or supplier centric way. A product can be supplied by many Supplier and Supplier will have many product. Following is the solution I am thinking, but it seems like there is lot of redundancy in field definitions, do I need 2 entity definitions to support Product or Supplier centric searches. Does not look optimum.

When doing a search for a Supplier, "product" can be defined "multiValue=true" When doing a search for a product, "supplier" can be defined "multiValue=true"

<!-- Field definitions to support supplier search -->
<field name="s_supplier" type="string" indexed="true" stored="true" >
<field name="s_product"  type="string" indexed="true" stored="true" multiValue="true">

<!-- Field definition to support product search -->
<field name="p_product"  type="string" indexed="true" stored="true" >
<field name="p_supplier"  type="string" indexed="true" stored="true" multiValue="true">

entity definition in datahandler is

<entity name="products" ....>
   <field name="p_product" column="">
   <entity name="suppliers">
      <field name="p_supplier">
    </entity>
</entity>

<entity name="suppliers" ....>
   <field name="s_supplier" column="">
   <entity name="products">
      <field name="s_product" column="">
    </entity>
</entity>
like image 750
tech20nn Avatar asked Sep 14 '11 10:09

tech20nn


1 Answers

The beauty of the Solr search engine is that you can just pick one schema definition, either product or supplier centric and then leverage the power of Solr to achieve your desired results. Let's say that you go with a product centric one, using the following:

<field name="product" type="string" indexed="true" stored="true" > 
<field name="supplier" type="string" indexed="true" stored="true" multiValue="true">

You can now search for products just by running a query against the product field product:my product and then if you want to search for a specific supplier, you can just use supplier:my supplier and because the supplier field is a multivalued field associated with each product, you will get all the products back where that supplier is associated.

Another option for greater flexibility would be to leverage the text field that is defined in the example schema.xml file and use the 'copyfield function to copy both supplier and product values to one field and you can then search it for either and will get all documents returned that match the query term on either the supplier or product field.

Here is an example, still using the fields defined above.

<field name="text" type="text" indexed="true" stored="false" multiValued="true"/>
<copyField source="product" dest="text" />
<copyField source="supplier" dest="text" />

Then if you search for text:my term it can be either product or supplier and all documents in the index that match that field will be returned. Please note that the text field has specific index and query time analyzers applied to it, so you should be aware of what is being applied.

Also, if you need to produce a list of unique suppliers, you can leverage Solr Faceting to get that list from all of the documents in the index or only related to the current search criteria.

Please see some of the following references for more details on these topics:

  • Analyzer, Tokenizers and Token Filters
  • Schema
  • Faceting Overview
like image 151
Paige Cook Avatar answered Nov 15 '22 12:11

Paige Cook