I created an index in Elasticsearch with the following settings. After inserting data into the index using Bulk API, the docs.deleted
count is continuously increasing. Does this mean the documents are automatically getting deleted, if so what did i do wrong ?
PUT /inc_index/
{
"mappings": {
"store": {
"properties": {
"title": {
"type": "string",
"term_vector": "with_positions_offsets_payloads",
"store" : true,
"index_analyzer" : "fulltext_analyzer"
},
"description": {
"type": "string",
"term_vector": "with_positions_offsets_payloads",
"store" : true,
"index_analyzer" : "fulltext_analyzer"
},
"category": {
"type": "string"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : 5,
"number_of_replicas" : 1
},
"analysis": {
"analyzer": {
"fulltext_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"type_as_payload"
]
}
}
}
}
}
The output of "GET /_cat/indices?v"
is as shown below, the "docs.deleted"
is continuously increasing:
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open inc_index 5 1 2009877 584438 6.8gb 3.6gb
One of the great features of Elasticsearch is that it can automatically delete old data.
Instead of storing information as rows of columnar data, Elasticsearch stores complex data structures that have been serialized as JSON documents. When you have multiple Elasticsearch nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately from any node.
You use DELETE to remove a document from an index. You must specify the index name and document ID. You cannot send deletion requests directly to a data stream. To delete a document in a data stream, you must target the backing index containing the document.
You could have one document per product or one document per order. There is no limit to how many documents you can store in a particular index.
If your bulk operations also include updates to existing documents (insert/update to documents with the same ID), then this is normal. In Elasticsearch, an update is a combo of delete+insert operations: https://www.elastic.co/guide/en/elasticsearch/guide/current/update-doc.html
And the deleted documents you see there are documents marked as deleted. When the Lucene segments merging happens, the deleted documents will be physically removed from disk.
ElasticSearch indexes have been composed of “segments”. Since segments have a policy of "write once", when we delete/update any document from ElasticSearch, it is not actually deleted, only marked as deleted and increases the count in "doc.deleted".
The more segments means slower searches and more memory used. Elasticsearch solves this problem by merging segments in the background. Small segments are merged into bigger segments, which, in turn, are merged into even bigger segments...while merging those segments if there are any documents which are marked as deleted, it doesn't copy that doc in the bigger segment. And Once merging has finished, the old segments are deleted. That's why there is further decrease in "doc.deleted" value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With