Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indexing different type of Entities/Objects with Solr Lucene

Let's say I want to index my shop using Solr Lucene.

I have many types of entities : Products, Product Reviews, Articles

How do I get my Lucene to index those types, but each type with different Schema ?

like image 638
Yos Avatar asked Jun 14 '10 11:06

Yos


2 Answers

I recommend creating your index in a way that all of you entities have more or less the same basic fields: title, content, url, uuid, entity_type, entity_sourcename etc. If each of your entities has a unique set of corresponding index field, you'll have hard time constructing query to search all entities simultaneously, and your results view may become a huge mess. If you need some specific fields for a specific entity, then add it and perform special logic for this entity based on its entity_type.

I'm speaking from experience: we're managing an index with over 10 different entities and this approach works like charm.

P.S. A few other simple advices.

  1. Make sure your Lucene document contains all of the necessary data to construct the result and show it to user (so that you don't need to go to the database to construct the result). Lucene queries are generally much faster than database queries.
  2. If you absolutely need to use database to construct your result set (e.g. to apply permissions), use Lucene query first to narrow results, database query second to filter them.
  3. Don't be afraid to add custom fields to some of your documents if you need it: think of Lucene document as of key-value datastore.
like image 154
buru Avatar answered Nov 15 '22 21:11

buru


Multi-core is an approach to use with care. With a simple schema like yours, it's a better way to do as buru recommands. That means to find common fields between your different entities, and then fields that will be used only by on or several of them. You can then add a field "type" or "type_id" which will say if your entity is product, a product review...

Doing so will enable you to have an unique index, and to process queries fastly.

like image 25
Guillaume Lebourgeois Avatar answered Nov 15 '22 22:11

Guillaume Lebourgeois