Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr 4.0 storing and searching Normalize data of Profile

I was evaluating Solr 4.0 and Elastic Search 0.20.5 for linkedin type searching and wondering how to store Normalize data of user profile which can easily achieved in elasticsearch using nested document.

For example
Person Json

{
    first_name: abc,
    last_name: xyz,
    school: [{
      name: some school,
      degree: x-Degree,
      startDate:12-02-2009
   },
   {
      name: some school2,
      degree: x-Degree-2,
      startDate:12-02-2012
   }
   ]

}

I want to search on users schools name, degrees and currently studing similar to linkedin search,

What's the best way to index and search it in Solr?

like image 901
maaz Avatar asked Nov 04 '22 03:11

maaz


2 Answers

Unfortunately, Solr is not as capable of defining nested documents as elasticsearch.

In Solr's case, the answer is to use multiValued fields that mimic the desired information in the flattened document. Personally, I find this to be very limiting, particularly because grouped details (objects) may be separated, but it is the Solr way. You can use dynamic fields to fix this (e.g., school_name_1 is linked with school_degree_1 and school_name_2 with school_degree_2), as suggested by arun's referenced link, but that's a much bigger hassle compared to the flexibility of elasticsearch.

If your document is in XML, then you can use the XPathEntityProcessor to automatically flatten it. Perhaps more unfortunately, I am not aware of any JSON processor that performs the analogous action.

You're going to want a schema similar to:

<field name="first_name" indexed="true" />
<field name="last_name" indexed="true" />
<field name="school_name" multiValued="true" indexed="true" />
<field name="school_degree" multiValued="true" indexed="true" />
<field name="school_start_date" multiValued="true" indexed="true" />

Don't forget about the end date. You may also want to consider that students can have multiple degrees, though this could be solved by simply doubling up on the school, or making the degree an array when it's the same starting year.

like image 60
pickypg Avatar answered Nov 15 '22 05:11

pickypg


I'm sure you can achieve exactly what you want. There are many field types and community plugins. The only problem is it's hard to find a good documentation.

You can obviously go for multiValued fields like @pickypg suggested. The problem will occur when you will try to search by school_name and school_degree in one query. Results will be incorrect.

What I'm doing for slightly different problem is using PointType class:

<fieldType name="range" class="solr.PointType" dimension="1" subFieldType="double" />

<field name="cat_lr" type="range" indexed="true" stored="true" multiValued="true"/>

It allows me to have multiple ranges per document. I insert them like this:

cat_lr=2,5

and I look for them like this:

+cat_lr:[1 TO 10]

I hope that will help with you issue. Good luck with documentation.

like image 32
Lukasz Kujawa Avatar answered Nov 15 '22 05:11

Lukasz Kujawa