Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with relational data in Solr

We're right now planning to roll out Solr search for e-commerce site with faceted catalog navigation.

We are having a little complex data schema for product and its specification attribute which are dynamic.

We are certainly not able to understand how to map this data into the Solr? Does we need to have two indexes, one for product index and another for specification attribute that is mapped with product or just single schema.

But how, any example will be great.

like image 362
Dharmik Bhandari Avatar asked Jun 12 '12 14:06

Dharmik Bhandari


People also ask

Is Solr a relational database?

Solr is a document structured database. Entities like “Person” are composed of fields like name, address, and email. Those documents are stored in collections. Collections are the closest analog to tables in a relational database.

Can Solr be used as a database?

Yes, you can use SOLR as a database but there are some really serious caveats : SOLR's most common access pattern, which is over http doesnt respond particularly well to batch querying. Furthermore, SOLR does NOT stream data --- so you can't lazily iterate through millions of records at a time.

How is data stored in Solr?

Apache Solr stores the data it indexes in the local filesystem by default. HDFS (Hadoop Distributed File System) provides several benefits, such as a large scale and distributed storage with redundancy and failover capabilities. Apache Solr supports storing data in HDFS.

Why is the relational model important?

The relational model permits changes to a database structure to be implemented easily without impacting the data or the rest of the database. The database analyst can quickly and easily add, remove, and modify tables and columns in an existing database to meet business requirements.


1 Answers

Currently you cannot join across multiple solr indexes. There is going to be Join functionality in Solr 4.0. However, this will enable joining documents within an individual index.

Normalized database schema must be flattened before indexing it in Solr. This is actually where you will gain most run time performance gain, as joins in a database are expensive.

Duplication of some of the columns across Products and Specifications is ok. May be if you can describe the individual attributes and cardinality, I could opine further.

My background is, I have indexed a heavily normalized database schema into 3 solr indexes. I used a cardinality test, and search usecases to narrow down this split. For instance, I had Customer agreements in one index, agent agreements in another, and relationships between customers and agents in another. I landed on the fewest indexes I could get to. A service tier integrates the three indexes. Creating a single index here would have made it too huge and complex to maintain.

One of the other approaches you can try is search solr, and enrich individual docs by a database dip. This comes with some cost, but if you already resolved primary keys in the solr search, these lookups will not be that expensive.

like image 136
user1452132 Avatar answered Sep 20 '22 18:09

user1452132