Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch: Delete field from all documents where it exists (with Painless?)

Situation: I have an index with strict mapping and I want to delete an old field from it which is no longer used. So I create a new index with mapping that doesn't include that field and I try to reindex the data into the new index.

Problem: When I reindex, I get an error, because I'm trying to index data into a field that is not available in the mapping. So to solve this, I want to remove that field from all documents in the original index first, before I can reindex.

PUT old_index/_doc/1
{
    "field_to_delete" : 5
}
PUT old_index/_doc/2
{
    "field_to_delete" : null
}
POST _reindex
{
  "source": {
    "index": "old_index"
  },
  "dest": {
    "index": "new_index"
  }
}
"reason": "mapping set to strict, dynamic introduction of [field_to_delete] within [new_index] is not allowed"

1. Some places I found suggest doing:

POST old_index/_doc/_update_by_query
{
  "script": "ctx._source.remove('field_to_delete')",
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "field_to_delete"
          }
        }
      ]
    }
  }
}

However that doesn't match documents that have an explicit value of null, so reindexing still fails after this update.

2. Others (like members of the Elastic team in their official forum) suggest doing something like:

POST old_index/_doc/_update_by_query
{
  "script": {
    "source": """
          if (ctx._source.field_to_delete != null) {
            ctx._source.remove("field_to_delete");
          } else {
            ctx.op="noop";
          }
      """
    }
  },
  "query": {
    "match_all": {}
  }
}

However this has the same problem - it doesn't remove the second document that has an explicit value of null.

3. In the end I could just do:

POST old_index/_doc/_update_by_query
{
  "script": {
    "source": "ctx._source.remove("field_to_delete");"}
  },
  "query": {
    "match_all": {}
  }
}

But this will update all documents and for a large index could mean additional downtime during deployment.

like image 979
pmishev Avatar asked Jun 18 '19 12:06

pmishev


People also ask

How do I delete all my Elasticsearch records?

Elasticsearch provides _delete_by_query REST API to delete multiple documents based on a specified query, Here we need to pass JSON as the request body with POST method, It will delete those document which fulfills by specified query.

Does deleting index delete documents Elasticsearch?

Deleting an index deletes its documents, shards, and metadata. It does not delete related Kibana components, such as data views, visualizations, or dashboards.

How do I purge data in Elasticsearch?

You use DELETE to remove a document from an index. You must specify the index name and document ID. You cannot send deletion requests directly to a data stream. To delete a document in a data stream, you must target the backing index containing the document.


1 Answers

Eventually I found the correct way to do it, so I'm sharing it for the general knowledge:

POST old_index/_doc/_update_by_query
{
  "script": {
    "source": """
        if (ctx._source.containsKey("field_to_delete")) {
            ctx._source.remove("field_to_delete");
        } else {
          ctx.op="noop";
        }
      """
  },
  "query": {
    "match_all": {}
  }
}
like image 111
pmishev Avatar answered Sep 28 '22 06:09

pmishev