Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to copy some ElasticSearch data to a new index

Let's say I have movie data in my ElasticSearch and I created them like this:

curl -XPUT "http://192.168.0.2:9200/movies/movie/1" -d'
{
    "title": "The Godfather",
    "director": "Francis Ford Coppola",
    "year": 1972
}'

And I have a bunch of movies from different years. I want to copy all the movies from a particular year (so, 1972) and copy them to a new index of "70sMovies", but I couldn't see how to do that.

like image 996
cybergoof Avatar asked Aug 05 '14 16:08

cybergoof


People also ask

How do I move data from one index to another in Elasticsearch?

Basically you would take a snapshot of your existing index, restore it into a new index and then use the Delete command to delete all documents with a year other than 1972. The snapshot and restore module allows to create snapshots of individual indices or an entire cluster into a remote repository.

How do you copy in Elasticsearch index?

Indices can only be cloned if they meet the following requirements: The target index must not exist. The source index must have the same number of primary shards as the target index. The node handling the clone process must have sufficient free disk space to accommodate a second copy of the existing index.

How do I extract data from Elasticsearch?

Here are three popular methods, you use to export files from Elasticsearch to any desired warehouse or platform of your choice: Elasticsearch Export: Using Logstash-Input-Elasticsearch Plugin. Elasticsearch Export: Using Elasticsearch Dump. Elasticsearch Export: Using Python Pandas.

How do I add data to Elasticsearch index?

If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias: To add or overwrite a document using the PUT /<target>/_doc/<_id> request format, you must have the create , index , or write index privilege.


3 Answers

Since ElasticSearch 2.3 you can now use the built in _reindex API

for example:

POST /_reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter"
  }
}

Or only a specific part by adding a filter/query

POST /_reindex
{
  "source": {
    "index": "twitter",
    "query": {
      "term": {
        "user": "kimchy"
      }
    }
  },
  "dest": {
    "index": "new_twitter"
  }
}

Read more: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

like image 181
Ludo - Off the record Avatar answered Nov 07 '22 21:11

Ludo - Off the record


The best approach would be to use elasticsearch-dump tool https://github.com/taskrabbit/elasticsearch-dump.

The real world example I used :

elasticdump \
  --input=http://localhost:9700/.kibana \
  --output=http://localhost:9700/.kibana_read_only \
  --type=mapping
elasticdump \
  --input=http://localhost:9700/.kibana \
  --output=http://localhost:9700/.kibana_read_only \
  --type=data
like image 33
MAQ Avatar answered Nov 07 '22 20:11

MAQ


Check out knapsack: https://github.com/jprante/elasticsearch-knapsack

Once you have the plugin installed and working, you could export part of your index via query. For example:

curl -XPOST 'localhost:9200/test/test/_export' -d '{
"query" : {
    "match" : {
        "myfield" : "myvalue"
    }
},
"fields" : [ "_parent", "_source" ]
}'

This will create a tarball with only your query results, which you can then import into another index.

like image 5
coffeeaddict Avatar answered Nov 07 '22 20:11

coffeeaddict