We work with two types of documents on elastic search (ES): items and slots, where items are parents of slot documents. We define the index with the following command: <pre class="prettyprint"><code>curl -XPOST 'localhost:9200/items' -d @itemsdef.json </code></pre> where <code>itemsdef.json</code> has the following definition <pre class="prettyprint"><code>{ "mappings" : { "item" : { "properties" : { "id" : {"type" : "long" }, "name" : { "type" : "string", "_analyzer" : "textIndexAnalyzer" }, "location" : {"type" : "geo_point" }, } } }, "settings" : { "analysis" : { "analyzer" : { "activityIndexAnalyzer" : { "alias" : ["activityQueryAnalyzer"], "type" : "custom", "tokenizer" : "whitespace", "filter" : ["trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"] }, "textIndexAnalyzer" : { "type" : "custom", "tokenizer" : "whitespace", "filter" : ["word_delimiter_impl", "trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"] }, "textQueryAnalyzer" : { "type" : "custom", "tokenizer" : "whitespace", "filter" : ["trim", "lowercase", "asciifolding", "spanish_stop"] } }, "filter" : { "spanish_stop" : { "type" : "stop", "ignore_case" : true, "enable_position_increments" : true, "stopwords_path" : "analysis/spanish-stopwords.txt" }, "spanish_synonym" : { "type" : "synonym", "synonyms_path" : "analysis/spanish-synonyms.txt" }, "word_delimiter_impl" : { "type" : "word_delimiter", "generate_word_parts" : true, "generate_number_parts" : true, "catenate_words" : true, "catenate_numbers" : true, "split_on_case_change" : false } } } } } </code></pre> Then we add the child document definition using the following command: <pre class="prettyprint"><code>curl -XPOST 'localhost:9200/items/slot/_mapping' -d @slotsdef.json </code></pre> Where <code>slotsdef.json</code> has the following definition: <pre class="prettyprint"><code>{ "slot" : { "_parent" : {"type" : "item"}, "_routing" : { "required" : true, "path" : "parent_id" }, "properties": { "id" : { "type" : "long" }, "parent_id" : { "type" : "long" }, "activity" : { "type" : "string", "_analyzer" : "activityIndexAnalyzer" }, "day" : { "type" : "integer" }, "start" : { "type" : "integer" }, "end" : { "type" : "integer" } } } } </code></pre> Finally we perform a bulk index with the following command: <pre class="prettyprint"><code>curl -XPOST 'localhost:9200/items/_bulk' --data-binary @testbulk.json </code></pre> Where testbulk.json holds the following data: <pre class="prettyprint"><code>{"index":{"_type": "item", "_id":35}} {"location":[40.4,-3.6],"id":35,"name":"A Name"} {"index":{"_type":"slot","_id":126,"_parent":35}} {"id":126,"start":1330,"day":1,"end":1730,"activity":"An Activity","parent_id":35} </code></pre> We see through ES Head plugin that definitions seem to be ok. We test the analyzers to check that they have been loaded and they work. Both documents appear listed in ES Head browser view. But if we try to retrieve the child item using the API, ES responds that it does not exist: <pre class="prettyprint"><code>$ curl -XGET 'localhost:9200/items/slot/126' {"_index":"items","_type":"slot","_id":"126","exists":false} </code></pre> When we import 50 documents, all parent documents can be retrieved through API, but only SOME of the requests for child elements get a successful response. My guess is that it may have something to do with how docs are stored across shards and the routing...which certainly is not clear to me how it works. Any clue on how to be able to retrieve individual child documents? ES Head shows they have been stored but HTTP GETs to localhost:9200/items/slot/XXX respond randomly with "exists":false.

The child documents are using parent's id for routing. So, in order to retrieve child documents you need to specify parent id in the routing parameter on your query: <pre class="prettyprint"><code>curl "localhost:9200/items/slot/126?routing=35" </code></pre> If parent id is not available, you will have to search for the child documents: <pre class="prettyprint"><code>curl "localhost:9200/items/slot/_search?q=id:126" </code></pre> or switch to an index with a single shard.

problems on elasticsearch with parent child documents

Tags:

elasticsearch

parent-child

We work with two types of documents on elastic search (ES): items and slots, where items are parents of slot documents. We define the index with the following command:

curl -XPOST 'localhost:9200/items' -d @itemsdef.json

where itemsdef.json has the following definition

{
"mappings" : {
    "item" : {
        "properties" : {
            "id" : {"type" : "long" },
            "name" : {
                "type" : "string",
                "_analyzer" : "textIndexAnalyzer"   
            },
            "location" : {"type" : "geo_point" },
        }
    }
},
"settings" : {
    "analysis" : {
        "analyzer" : {

                "activityIndexAnalyzer" : {
                    "alias" : ["activityQueryAnalyzer"],
                    "type" : "custom",
                    "tokenizer" : "whitespace",
                    "filter" : ["trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
                },
                "textIndexAnalyzer" : {
                    "type" : "custom",
                    "tokenizer" : "whitespace",
                    "filter" : ["word_delimiter_impl", "trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
                },
                "textQueryAnalyzer" : {
                    "type" : "custom",
                    "tokenizer" : "whitespace",
                    "filter" : ["trim", "lowercase", "asciifolding", "spanish_stop"]
                }       
        },
        "filter" : {        
                "spanish_stop" : {
                    "type" : "stop",
                    "ignore_case" : true,
                    "enable_position_increments" : true,
                    "stopwords_path" : "analysis/spanish-stopwords.txt"
                },
                "spanish_synonym" : {
                    "type" : "synonym",
                    "synonyms_path" : "analysis/spanish-synonyms.txt"
                },
                "word_delimiter_impl" : {
                    "type" : "word_delimiter",
                    "generate_word_parts" : true,
                    "generate_number_parts" : true,
                    "catenate_words" : true,
                    "catenate_numbers" : true,
                    "split_on_case_change" : false                  
                }               
        }
    }
}
}

Then we add the child document definition using the following command:

curl -XPOST 'localhost:9200/items/slot/_mapping' -d @slotsdef.json

Where slotsdef.json has the following definition:

{
"slot" : {
    "_parent" : {"type" : "item"},
    "_routing" : {
        "required" : true,
        "path" : "parent_id"
    },
    "properties": {
        "id" : { "type" : "long" },
        "parent_id" : { "type" : "long" },
        "activity" : {
            "type" : "string",
            "_analyzer" : "activityIndexAnalyzer"
        },
        "day" : { "type" : "integer" },
        "start" : { "type" : "integer" },
        "end" :  { "type" : "integer" }
    }
}   
}

Finally we perform a bulk index with the following command:

curl -XPOST 'localhost:9200/items/_bulk' --data-binary @testbulk.json

Where testbulk.json holds the following data:

{"index":{"_type": "item", "_id":35}}
{"location":[40.4,-3.6],"id":35,"name":"A Name"}
{"index":{"_type":"slot","_id":126,"_parent":35}}
{"id":126,"start":1330,"day":1,"end":1730,"activity":"An Activity","parent_id":35}

We see through ES Head plugin that definitions seem to be ok. We test the analyzers to check that they have been loaded and they work. Both documents appear listed in ES Head browser view. But if we try to retrieve the child item using the API, ES responds that it does not exist:

$ curl -XGET 'localhost:9200/items/slot/126'
{"_index":"items","_type":"slot","_id":"126","exists":false}

When we import 50 documents, all parent documents can be retrieved through API, but only SOME of the requests for child elements get a successful response.

My guess is that it may have something to do with how docs are stored across shards and the routing...which certainly is not clear to me how it works.

Any clue on how to be able to retrieve individual child documents? ES Head shows they have been stored but HTTP GETs to localhost:9200/items/slot/XXX respond randomly with "exists":false.

538

asked Nov 25 '12 01:11

Daniel Cerecedo

1 Answers

The child documents are using parent's id for routing. So, in order to retrieve child documents you need to specify parent id in the routing parameter on your query:

curl "localhost:9200/items/slot/126?routing=35"

If parent id is not available, you will have to search for the child documents:

curl "localhost:9200/items/slot/_search?q=id:126"

or switch to an index with a single shard.

130

answered Nov 15 '22 05:11

imotov

Related questions
                            
                                Pass from child window to parent window
                            
                                How to get all the child and grandchild categories of a parent category in codeigniter?
                            
                                Parent child hierarchy with order by on name
                            
                                How to get return value from child process to parent?
                            
                                dup2() and exec()
                            
                                selenium find child's child elements
                            
                                Java List how to set and get children objects of a list of type parent
                            
                                Go embedded struct call child method instead parent method
                            
                                How to only have margin between child and not with parent in CSS?
                            
                                list child elements using nightwatch js
                            
                                C# - How to deal with 2 "TopMost" Forms?
                            
                                Keep child divs on one line?
                            
                                Postgres table inheritance: move from parent to child and vice versa
                            
                                CSS : last child no border
                            
                                Hibernate: Delete all children with one query
                            
                                An advice for a design between parent and child classes?
                            
                                jquery - How do i set the heights for each set of child divs within each container div?
                            
                                What happens to address's, values, and pointers after a fork()
                            
                                What is the correct way to change properties in a parent form from a child form?
                            
                                jQuery - Immediate children

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With