Let me preface by mentioning that I've been through everything I could find about this topic including the Solr docs and all of the SO questions.
I have a Solr instance that I've setup with a Data Import Hanlder to pull in data from MSSQL using the JDBC driver. The data comes in, but it isn't structured as I'd expect based on the Solr DIH documentation
<document>
<entity>
<entity />
</entity>
</document>
I've tried all the attributes like rootEntity, flatten, using CachedSqlProvider, etc. With multiValued="True" The result ends up
docs [
{
recordId: '1234',
name: 'whatever'
subrows_col1: ['x','y','z']
subrows_col2: ['a','b','c']
}
]
When I'm looking for
docs [
{
recordId: '1234',
name: 'whatever'
subrows: [{
col1: 'x',
col2: 'a'
},
{
col1: 'y',
col2: 'b'
},
{
col1: 'z',
col2: 'c'
}]
} ]
I've seen the block-join stuff, but I'm confused as to where it goes. I added
<add>
<doc>
<field />
<doc>
<field />
</doc>
<doc>
</add>
to the DIH requestHandler, but it did nothing. I added it to the /update requestHandler and I got an error. I have no clue where that is supposed to go. Does it only work during a query or is it only for when you push data to solr via /update?
Where do I define the structure for the document? I tried nested fields in the schema, entities in the DIH config and the block-join stuff in the requestHandlers. nothing has worked yet.
Obviously I'm missing something.
Indexing nested document in DIH is finally supported from Solr 5.1 onwards.
https://issues.apache.org/jira/browse/SOLR-5147
Simply adding child=true
to the child entity, then Solr DIH will automagically indexes as child document.
Example taken from JIRA (in the link above) :
<document>
<entity name='PARENT' query='select * from PARENT'>
<field column='id' />
<field column='desc' />
<field column='type_s' />
<entity child='true' name='CHILD' query="select * from CHILD where parent_id='${PARENT.id}'">
<field column='id' />
<field column='desc' />
<field column='type_s' />
</entity>
</entity>
</document>
I've also decompiled DocBuilder.class
in solr-dataimporthandler-5.3.0.jar
, found this code snippet : -
if (doc != null) {
if (epw.getEntity().isChild())
{
childDoc = new DocWrapper();
handleSpecialCommands(arow, childDoc);
addFields(epw.getEntity(), childDoc, arow, vr);
doc.addChildDocument(childDoc);
}
else
{
handleSpecialCommands(arow, doc);
addFields(epw.getEntity(), doc, arow, vr);
}
}
Noticed that if epw.getEntity().isChild()
will return true if child="true"
is set, thus it's creating a new DocWrapper
and add as child document instead of simply adding the entity as a bunch of new fields.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With