For ElasticSearch document IDs, are there any character constraints or restrictions?
I am really interested to know if forward slash '/' would cause any issues here? I have some news feeds which I would like to index. The problem is that the database that contains this data that has UID set to the URL of the news feed. Don't ask me why it was designed this way because I haven't gotten a clue.
I want to use the same identifier(URL) for ElasticSearch document. I have successfully used GUIDs, alphanumeric and numeric characters without problems.
If I can't what would be the best workaround - should i encode the entire url?
Thanks
ES document ids are always stored as strings, even if you give an integer at indexing time.
The ID is now generated with a time based element, but the byte re-ordering causes it not to be sortable by time. You could write a script field using Painless that would put it back in sortable order and then use that.
By default, you cannot use from and size to page through more than 10,000 hits. This limit is a safeguard set by the index. max_result_window index setting. If you need to page through more than 10,000 hits, use the search_after parameter instead.
To get the ID of a document, simply use the _id field in the search query. For example, if you want to find the document with the ID "12345", you would use the following query: _id:12345. This would return the document with the ID "12345".
There are no constraints. Forward slashes can be used. But in order to use such id in the REST API, it has to be url encoded:
$ curl -XPUT "localhost:9200/id-test-index/rec/1+1%2F2" -d '{"field" : "one and a half"}'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With