Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Renaming fields to new index in Elasticsearch

I have an index with this mapping:

curl -XPUT 'http://localhost:9200/origindex/_mapping/page' -d '
   {
    "page" : {
        "properties" : {
            "title" : {"type" : "text"},
            "body" : {"type" : "text"},
            "other": {"type": "text"}
        }
     }
   }'

In a new index, I want to copy "title" to "title1" and "title2", and "body" to "body1" and "body2" (disregarding "other"), and change the type from "page" to "articles_eng". The new index has this mapping:

curl -XPUT 'http://localhost:9200/newindex/_mapping/articles_eng' -d '                             
{                                                                                                  
    "articles_eng" : {                                                                             
        "properties" : {                                                                           
            "title1" : {                                                                     
                 "type" : "text",                                                                  
                 "analyzer" : "my_analyzer1"                                                    
             },                                                                                     
            "title2" : {                                                                   
                 "type" : "text",                                                                  
                 "analyzer": "my_analyzer2"                                                    
             },                                                                                     
            "body1": {                                                                       
                "type" : "text",                                                                  
                "analyzer": "my_analyzer1"                                                     
            },                                                                                     
            "body2" : {                                                                     
                "type" : "text",                                                                  
                "analyzer": "my_analyzer2" 
            }                                                   
        }                                                                                      
    }                                                                                          
}'                                                                                              

From looking at this answer and the Elasticsearch reindex docs I come up with something like this:

curl -XPOST http://localhost:9200/_reindex -d '{                                                   
    "source": {                                                                                    
        "index": "origindex",                                                                          
        "type": "page",                                                                            
        "query": {                                                                                 
           "match_all": {}                                                                         
        },                                                                                         
        "_source": [ "title", "body" ]                                                             
    },                                                                                             
    "dest": {                                                                                      
        "index": "newindex"                                                                        
    },                                                                                             
    "script": {                                                                                    
        "inline": "ctx._type = \"articles_eng\"";                                                  
                  "ctx._title1 = ctx._source._title";                                         
                  "ctx._title2 = ctx._source._title";                                       
                  "ctx._body1 = ctx._source._body";                                          
                  "ctx._body2 = ctx._source._body"                                                                                                   
    }                                                                                              
}'

I'm having trouble with the script lines. If I do only the top line (changing the document type), everything works fine. If I add the rest of the lines, I get an error

"[reindex] failed to parse field [script]"

caused by

"Unexpected character (';' (code 59)): was expecting comma to separate Object entries\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@37649463; line: 14, column: 50]"

Even if I can sort out the issue with the multiple statements, putting in just the second line gives me the error

"Invalid fields added to context [title1]"}]

Can anyone help me out? It seems like this shouldn't be impossible to do.

like image 231
kslnet Avatar asked Feb 23 '17 18:02

kslnet


1 Answers

If I do only the top line (changing the document type), everything works fine. If I add the rest of the lines, I get an error

You don't need to put all inline statement in double quotes instead you can put all inline script statements seperated by semi-colon(;) and enclosed in double quotes(") as shown below:

"script": {
    "inline": "ctx._source.title1 = ctx._source.title; ctx._source.title2 = ctx._source.remove(\"title\");ctx._source.body1 = ctx._source.body; ctx._source.body2 = ctx._source.remove(\"body\");ctx._type=\"articles_eng\""
}

Even if I can sort out the issue with the multiple statements, putting in just the second line gives me the error

You are trying to access source fields in wrong way. Metadata fields(like _id, _type, _index ..) should be accessed as ctx._type / ctx._id where as source fields(like title, body, other in your case) should be accessed as ctx._source.title/ ctx._source.body .

So finally, your ReIndex query should look like this:

POST _reindex
{
  "source": {
    "index": "origindex",
    "_source": [ "title", "body" ]
  },
  "dest": {
    "index": "newindex"
  },
  "script": {
    "inline": "ctx._source.title1 = ctx._source.title; ctx._source.title2 = ctx._source.remove(\"title\");ctx._source.body1 = ctx._source.body; ctx._source.body2 = ctx._source.remove(\"body\");ctx._type=\"articles_eng\""
  }
}

Hope this helps!

like image 140
avr Avatar answered Nov 12 '22 07:11

avr