Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch - Delete documents by specific field

This seemingly simple task is not well-documented in the ElasticSearch documentation:

We have an ElasticSearch instance with an index that has a field in it called sourceId. What API call would I make to first, GET all documents with 100 in the sourceId field (to verify the results before deletion) and then to DELETE same documents?

like image 669
Stpete111 Avatar asked Nov 12 '18 22:11

Stpete111


People also ask

How do I delete a specific file in Elasticsearch?

You use DELETE to remove a document from an index. You must specify the index name and document ID. You cannot send deletion requests directly to a data stream. To delete a document in a data stream, you must target the backing index containing the document.

Does deleting index delete documents Elasticsearch?

Deleting an index deletes its documents, shards, and metadata. It does not delete related Kibana components, such as data views, visualizations, or dashboards.

How does delete work in Elasticsearch?

While processing a delete by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents to delete. A bulk delete request is performed for each batch of matching documents.

What is Elasticsearch flush?

Flushing a data stream or index is the process of making sure that any data that is currently only stored in the transaction log is also permanently stored in the Lucene index.


1 Answers

You probably need to make two API calls here. First to view the count of documents, second one to perform the deletion.

Query would be the same, however the end points are different. Also I'm assuming the sourceId would be of type keyword

Query to Verify

POST <your_index_name>/_search
{
  "size": 0,
  "query": {
    "term": {
      "sourceId": "100"
    }
  }
}

Execute the above Term Query and take a note at the hits.total of the response.

Remove the "size":0 in the above query if you want to view the entire documents as response.

Once you have the details, you can go ahead and perform the deletion using the same query as shown in the below query, notice the endpoint though.

Query to Delete

POST <your_index_name>/_delete_by_query
{
  "query": {
    "term": {
      "sourceId": "100"
    }
  }
}

Once you execute the Deletion By Query, notice the deleted field in the response. It must show you the same number.

I've used term queries however you can also make use of any Match or any complex Bool Query. Just make sure that the query is correct.

Hope it helps!

like image 163
Kamal Avatar answered Nov 26 '22 12:11

Kamal