I have a field in a doc that is a string representation of ipv4 ("1.2.3.4"), the name of the field is "originating_ip". I'm trying to use the scripted fields using the painless language in order to add a new field (originating_ip_calc) to have the int (long) representation of said IPv4. The following script works in groovy (and from what I understand this should basically work almost the same), but it seems like almost is not in this specific case. <pre class="prettyprint"><code>String[] ipAddressInArray = "1.2.3.4".split("\\."); long result = 0; for (int i = 0; i < ipAddressInArray.length; i++) { int power = 3 - i; int ip = Integer.parseInt(ipAddressInArray[i]); long longIP = (ip * Math.pow(256, power)).toLong(); result = result + longIP; } return result; </code></pre> I also looking in this question and as you can see from the code above it is based on one of the answers there. Also tried to work with InetAddress but no luck.

With Elasticsearch painless scripting you can use code like the following: <pre class="prettyprint"><code>POST ip_search/doc/_search { "query": { "match_all": {} }, "script_fields": { "originating_ip_calc": { "script": { "source": """ String ip_addr = params['_source']['originating_ip']; def ip_chars = ip_addr.toCharArray(); int chars_len = ip_chars.length; long result = 0; int cur_power = 0; int last_dot = chars_len; for(int i = chars_len -1; i>=-1; i--) { if (i == -1 || ip_chars[i] == (char) '.' ){ result += (Integer.parseInt(ip_addr.substring(i+ 1, last_dot)) * Math.pow(256, cur_power)); last_dot = i; cur_power += 1; } } return result """, "lang": "painless" } } }, "_source": ["originating_ip"] } </code></pre> (Note that I used Kibana console to send the request to ES, it does some escaping to make this a valid JSON before sending.) This will give a response like this: <pre class="prettyprint"><code>"hits": [ { "_index": "ip_search", "_type": "doc", "_id": "2", "_score": 1, "_source": { "originating_ip": "10.0.0.1" }, "fields": { "originating_ip_calc": [ 167772161 ] } }, { "_index": "ip_search", "_type": "doc", "_id": "1", "_score": 1, "_source": { "originating_ip": "1.2.3.4" }, "fields": { "originating_ip_calc": [ 16909060 ] } } ] </code></pre> But why does it have to be this way? <h3>Why does the approach with <code>.split</code> not work?</h3> If you send the code from the question to ES it replies with an error like this: <pre class="prettyprint"><code> "script": "String[] ipAddressInArray = \"1.2.3.4\".split(\"\\\\.\");\n\nlong result = 0;\nfor (int i = 0; i < ipAddressInArray.length; i++) {\n int power = 3 - i;\n int ip = Integer.parseInt(ipAddressInArray[i]);\n long longIP = (ip * Math.pow(256, power)).toLong();\n result = result + longIP;\n}\nreturn result;", "lang": "painless", "caused_by": { "type": "illegal_argument_exception", "reason": "Unknown call [split] with [1] arguments on type [String]." </code></pre> This is mainly due to the fact that Java's <code>String.split()</code> is not considered safe to use (because it creates regex Pattern implicitly). They suggest to use Pattern#split but to do so you should have regexes enabled in your index. By default, they are disabled: <pre class="prettyprint"><code> "script": "String[] ipAddressInArray = /\\./.split(\"1.2.3.4\");... "lang": "painless", "caused_by": { "type": "illegal_state_exception", "reason": "Regexes are disabled. Set [script.painless.regex.enabled] to [true] in elasticsearch.yaml to allow them. Be careful though, regexes break out of Painless's protection against deep recursion and long loops." </code></pre> <h3>Why do we have to do an explicit cast <code>(char) '.'</code>?</h3> So, we have to split the string on dots manually. The straightforward approach is to compare each char of the string with <code>'.'</code> (which in Java means <code>char</code> literal, not <code>String</code>). But for <code>painless</code> it means <code>String</code>. So we have to make an explicit cast to <code>char</code> (because we are iterating over an array of chars). <h3>Why do we have to work with char array directly?</h3> Because apparently <code>painless</code> does not allow <code>.length</code> method of <code>String</code> as well: <pre class="prettyprint"><code> "reason": { "type": "script_exception", "reason": "compile error", "script_stack": [ "\"1.2.3.4\".length", " ^---- HERE" ], "script": "\"1.2.3.4\".length", "lang": "painless", "caused_by": { "type": "illegal_argument_exception", "reason": "Unknown field [length] for type [String]." } } </code></pre> <h3>So why is it called <code>painless</code> ?</h3> Although I can't find any historical note on the naming after quick googling, from the documentation page and some experience (like above in this answer) I can infer that it is designed to be painless to use in production. It's predecessor, Groovy, was a ticking bomb due to resources usage and security vulnerabilities. So Elasticsearch team created a very limited subset of Java/Groovy scripting which would have predictable performance and would not contain those security vulnerabilities, and called it <code>painless</code>. If there is anything true about <code>painless</code> scripting language, is that it is limited and sandboxed.

Convert IP (string) into long in elasticsearch/kibana scripted fields

Tags:

elasticsearch

elasticsearch-painless

I have a field in a doc that is a string representation of ipv4 ("1.2.3.4"), the name of the field is "originating_ip". I'm trying to use the scripted fields using the painless language in order to add a new field (originating_ip_calc) to have the int (long) representation of said IPv4.

The following script works in groovy (and from what I understand this should basically work almost the same), but it seems like almost is not in this specific case.

String[] ipAddressInArray = "1.2.3.4".split("\\.");

long result = 0;
for (int i = 0; i < ipAddressInArray.length; i++) {
    int power = 3 - i;
    int ip = Integer.parseInt(ipAddressInArray[i]);
    long longIP = (ip * Math.pow(256, power)).toLong();
    result = result + longIP;
}
return result;

I also looking in this question and as you can see from the code above it is based on one of the answers there.

Also tried to work with InetAddress but no luck.

747

asked Nov 04 '18 15:11

Dekel

1 Answers

With Elasticsearch painless scripting you can use code like the following:

POST ip_search/doc/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "originating_ip_calc": {
      "script": {
        "source": """
String ip_addr = params['_source']['originating_ip'];
def ip_chars = ip_addr.toCharArray();
int chars_len = ip_chars.length;
long result = 0;
int cur_power = 0;
int last_dot = chars_len;
for(int i = chars_len -1; i>=-1; i--) {
  if (i == -1 || ip_chars[i] == (char) '.' ){
    result += (Integer.parseInt(ip_addr.substring(i+ 1, last_dot)) * Math.pow(256, cur_power));
    last_dot = i;
    cur_power += 1;
  }
}         
return result
""",
        "lang": "painless"
      }
    }
  },
  "_source": ["originating_ip"]
}

(Note that I used Kibana console to send the request to ES, it does some escaping to make this a valid JSON before sending.)

This will give a response like this:

"hits": [
  {
    "_index": "ip_search",
    "_type": "doc",
    "_id": "2",
    "_score": 1,
    "_source": {
      "originating_ip": "10.0.0.1"
    },
    "fields": {
      "originating_ip_calc": [
        167772161
      ]
    }
  },
  {
    "_index": "ip_search",
    "_type": "doc",
    "_id": "1",
    "_score": 1,
    "_source": {
      "originating_ip": "1.2.3.4"
    },
    "fields": {
      "originating_ip_calc": [
        16909060
      ]
    }
  }
]

But why does it have to be this way?

Why does the approach with `.split` not work?

If you send the code from the question to ES it replies with an error like this:

      "script": "String[] ipAddressInArray = \"1.2.3.4\".split(\"\\\\.\");\n\nlong result = 0;\nfor (int i = 0; i < ipAddressInArray.length; i++) {\n    int power = 3 - i;\n    int ip = Integer.parseInt(ipAddressInArray[i]);\n    long longIP = (ip * Math.pow(256, power)).toLong();\n    result = result + longIP;\n}\nreturn result;",
      "lang": "painless",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Unknown call [split] with [1] arguments on type [String]."

This is mainly due to the fact that Java's String.split() is not considered safe to use (because it creates regex Pattern implicitly). They suggest to use Pattern#split but to do so you should have regexes enabled in your index.

By default, they are disabled:

      "script": "String[] ipAddressInArray = /\\./.split(\"1.2.3.4\");...
      "lang": "painless",
      "caused_by": {
        "type": "illegal_state_exception",
        "reason": "Regexes are disabled. Set [script.painless.regex.enabled] to [true] in elasticsearch.yaml to allow them. Be careful though, regexes break out of Painless's protection against deep recursion and long loops."

Why do we have to do an explicit cast `(char) '.'`?

So, we have to split the string on dots manually. The straightforward approach is to compare each char of the string with '.' (which in Java means char literal, not String).

But for painless it means String. So we have to make an explicit cast to char (because we are iterating over an array of chars).

Why do we have to work with char array directly?

Because apparently painless does not allow .length method of String as well:

    "reason": {
      "type": "script_exception",
      "reason": "compile error",
      "script_stack": [
        "\"1.2.3.4\".length",
        "         ^---- HERE"
      ],
      "script": "\"1.2.3.4\".length",
      "lang": "painless",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Unknown field [length] for type [String]."
      }
    }

So why is it called `painless` ?

Although I can't find any historical note on the naming after quick googling, from the documentation page and some experience (like above in this answer) I can infer that it is designed to be painless to use in production.

It's predecessor, Groovy, was a ticking bomb due to resources usage and security vulnerabilities. So Elasticsearch team created a very limited subset of Java/Groovy scripting which would have predictable performance and would not contain those security vulnerabilities, and called it painless.

If there is anything true about painless scripting language, is that it is limited and sandboxed.

192

answered Nov 30 '22 22:11

Nikolay Vasiliev

Related questions
                            
                                ElasticSearch: Querying a field that's an array of objects
                            
                                Elasticsearch aggregation turns results to lowercase
                            
                                Elasticsearch 2.0 plugin installation INFO
                            
                                wildcard queries with multiple fields in ES?
                            
                                Insert multiple documents in elasticsearch
                            
                                Elasticsearch 2.1: how to delete by query using curl
                            
                                Not able to load Kibana on port 5601
                            
                                Mapping for array of geo_point fields in elastic
                            
                                How to set result size zero in spring data elasticsearch
                            
                                Sub aggregations in elasticsearch
                            
                                Elasticsearch, match any possible exact value of an Array in an Array
                            
                                How to use entity framework with elastic search
                            
                                Unsupported http.type [netty3] when trying to start embedded elasticsearch node
                            
                                Elasticsearch 5.2.0 - Could not reserve enough space for 2097152KB object heap
                            
                                Cannot Create Mapping and Add data in Elasticsearch
                            
                                Add @timestamp field in ElasticSearch with Python
                            
                                What is the maximum length for keyword type in elasticsearch?
                            
                                Specify logstash configuration on command line with docker
                            
                                ElasticSearch : Concurrent updates to index while _reindex for the same index in progress
                            
                                How to mock elastic search python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert IP (string) into long in elasticsearch/kibana scripted fields

Tags:

elasticsearch

elasticsearch-painless

Dekel

People also ask

1 Answers

Why does the approach with `.split` not work?

Why do we have to do an explicit cast `(char) '.'`?

Why do we have to work with char array directly?

So why is it called `painless` ?

Nikolay Vasiliev

Recent Activity

Donate For Us

Convert IP (string) into long in elasticsearch/kibana scripted fields

Tags:

elasticsearch

elasticsearch-painless

Dekel

People also ask

1 Answers

Why does the approach with .split not work?

Why do we have to do an explicit cast (char) '.'?

Why do we have to work with char array directly?

So why is it called painless ?

Nikolay Vasiliev

Related questions

Recent Activity

Donate For Us

Why does the approach with `.split` not work?

Why do we have to do an explicit cast `(char) '.'`?

So why is it called `painless` ?