Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

problems on elasticsearch with parent child documents

We work with two types of documents on elastic search (ES): items and slots, where items are parents of slot documents. We define the index with the following command:

curl -XPOST 'localhost:9200/items' -d @itemsdef.json

where itemsdef.json has the following definition

{
"mappings" : {
    "item" : {
        "properties" : {
            "id" : {"type" : "long" },
            "name" : {
                "type" : "string",
                "_analyzer" : "textIndexAnalyzer"   
            },
            "location" : {"type" : "geo_point" },
        }
    }
},
"settings" : {
    "analysis" : {
        "analyzer" : {

                "activityIndexAnalyzer" : {
                    "alias" : ["activityQueryAnalyzer"],
                    "type" : "custom",
                    "tokenizer" : "whitespace",
                    "filter" : ["trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
                },
                "textIndexAnalyzer" : {
                    "type" : "custom",
                    "tokenizer" : "whitespace",
                    "filter" : ["word_delimiter_impl", "trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
                },
                "textQueryAnalyzer" : {
                    "type" : "custom",
                    "tokenizer" : "whitespace",
                    "filter" : ["trim", "lowercase", "asciifolding", "spanish_stop"]
                }       
        },
        "filter" : {        
                "spanish_stop" : {
                    "type" : "stop",
                    "ignore_case" : true,
                    "enable_position_increments" : true,
                    "stopwords_path" : "analysis/spanish-stopwords.txt"
                },
                "spanish_synonym" : {
                    "type" : "synonym",
                    "synonyms_path" : "analysis/spanish-synonyms.txt"
                },
                "word_delimiter_impl" : {
                    "type" : "word_delimiter",
                    "generate_word_parts" : true,
                    "generate_number_parts" : true,
                    "catenate_words" : true,
                    "catenate_numbers" : true,
                    "split_on_case_change" : false                  
                }               
        }
    }
}
}

Then we add the child document definition using the following command:

curl -XPOST 'localhost:9200/items/slot/_mapping' -d @slotsdef.json

Where slotsdef.json has the following definition:

{
"slot" : {
    "_parent" : {"type" : "item"},
    "_routing" : {
        "required" : true,
        "path" : "parent_id"
    },
    "properties": {
        "id" : { "type" : "long" },
        "parent_id" : { "type" : "long" },
        "activity" : {
            "type" : "string",
            "_analyzer" : "activityIndexAnalyzer"
        },
        "day" : { "type" : "integer" },
        "start" : { "type" : "integer" },
        "end" :  { "type" : "integer" }
    }
}   
}

Finally we perform a bulk index with the following command:

curl -XPOST 'localhost:9200/items/_bulk' --data-binary @testbulk.json

Where testbulk.json holds the following data:

{"index":{"_type": "item", "_id":35}}
{"location":[40.4,-3.6],"id":35,"name":"A Name"}
{"index":{"_type":"slot","_id":126,"_parent":35}}
{"id":126,"start":1330,"day":1,"end":1730,"activity":"An Activity","parent_id":35}

We see through ES Head plugin that definitions seem to be ok. We test the analyzers to check that they have been loaded and they work. Both documents appear listed in ES Head browser view. But if we try to retrieve the child item using the API, ES responds that it does not exist:

$ curl -XGET 'localhost:9200/items/slot/126'
{"_index":"items","_type":"slot","_id":"126","exists":false}

When we import 50 documents, all parent documents can be retrieved through API, but only SOME of the requests for child elements get a successful response.

My guess is that it may have something to do with how docs are stored across shards and the routing...which certainly is not clear to me how it works.

Any clue on how to be able to retrieve individual child documents? ES Head shows they have been stored but HTTP GETs to localhost:9200/items/slot/XXX respond randomly with "exists":false.

like image 538
Daniel Cerecedo Avatar asked Nov 25 '12 01:11

Daniel Cerecedo


People also ask

How many documents can Elasticsearch hold?

You could have one document per product or one document per order. There is no limit to how many documents you can store in a particular index.

What kind of data can be stored in Elasticsearch?

There are two types of data you might want to store in Elasticsearch: Your JSON documents, containing numbers, lists, text, geo coordinates, and all the other formats Elasticsearch supports. Binary data.

What is inner hits in Elasticsearch?

The inner hits feature can be used for this. This feature returns per search hit in the search response additional nested hits that caused a search hit to match in a different scope. Inner hits can be used by defining an inner_hits definition on a nested , has_child or has_parent query and filter.

What is special about Elasticsearch?

Elasticsearch allows you to store, search, and analyze huge volumes of data quickly and in near real-time and give back answers in milliseconds. It's able to achieve fast search responses because instead of searching the text directly, it searches an index.


1 Answers

The child documents are using parent's id for routing. So, in order to retrieve child documents you need to specify parent id in the routing parameter on your query:

curl "localhost:9200/items/slot/126?routing=35"

If parent id is not available, you will have to search for the child documents:

curl "localhost:9200/items/slot/_search?q=id:126"

or switch to an index with a single shard.

like image 130
imotov Avatar answered Nov 15 '22 05:11

imotov