I was evaluating Solr 4.0 and Elastic Search 0.20.5 for linkedin type searching and wondering how to store Normalize data of user profile which can easily achieved in elasticsearch using nested document.
For example
Person Json
{
first_name: abc,
last_name: xyz,
school: [{
name: some school,
degree: x-Degree,
startDate:12-02-2009
},
{
name: some school2,
degree: x-Degree-2,
startDate:12-02-2012
}
]
}
I want to search on users schools name, degrees and currently studing similar to linkedin search,
What's the best way to index and search it in Solr?
Unfortunately, Solr is not as capable of defining nested documents as elasticsearch.
In Solr's case, the answer is to use multiValued
fields that mimic the desired information in the flattened document. Personally, I find this to be very limiting, particularly because grouped details (objects) may be separated, but it is the Solr way. You can use dynamic fields to fix this (e.g., school_name_1
is linked with school_degree_1
and school_name_2
with school_degree_2
), as suggested by arun's referenced link, but that's a much bigger hassle compared to the flexibility of elasticsearch.
If your document is in XML, then you can use the XPathEntityProcessor
to automatically flatten it. Perhaps more unfortunately, I am not aware of any JSON processor that performs the analogous action.
You're going to want a schema similar to:
<field name="first_name" indexed="true" />
<field name="last_name" indexed="true" />
<field name="school_name" multiValued="true" indexed="true" />
<field name="school_degree" multiValued="true" indexed="true" />
<field name="school_start_date" multiValued="true" indexed="true" />
Don't forget about the end date. You may also want to consider that students can have multiple degrees, though this could be solved by simply doubling up on the school, or making the degree an array when it's the same starting year.
I'm sure you can achieve exactly what you want. There are many field types and community plugins. The only problem is it's hard to find a good documentation.
You can obviously go for multiValued fields like @pickypg suggested. The problem will occur when you will try to search by school_name and school_degree in one query. Results will be incorrect.
What I'm doing for slightly different problem is using PointType
class:
<fieldType name="range" class="solr.PointType" dimension="1" subFieldType="double" />
<field name="cat_lr" type="range" indexed="true" stored="true" multiValued="true"/>
It allows me to have multiple ranges per document. I insert them like this:
cat_lr=2,5
and I look for them like this:
+cat_lr:[1 TO 10]
I hope that will help with you issue. Good luck with documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With