Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert IP (string) into long in elasticsearch/kibana scripted fields

I have a field in a doc that is a string representation of ipv4 ("1.2.3.4"), the name of the field is "originating_ip". I'm trying to use the scripted fields using the painless language in order to add a new field (originating_ip_calc) to have the int (long) representation of said IPv4.

The following script works in groovy (and from what I understand this should basically work almost the same), but it seems like almost is not in this specific case.

​String[] ipAddressInArray = "1.2.3.4".split("\\.");

long result = 0;
for (int i = 0; i < ipAddressInArray.length; i++) {
    int power = 3 - i;
    int ip = Integer.parseInt(ipAddressInArray[i]);
    long longIP = (ip * Math.pow(256, power)).toLong();
    result = result + longIP;
}
return result;

I also looking in this question and as you can see from the code above it is based on one of the answers there.

Also tried to work with InetAddress but no luck.

like image 747
Dekel Avatar asked Nov 04 '18 15:11

Dekel


People also ask

How to use scripted fields in Kibana?

Select the data view you want to add a scripted field to. Select the Scripted fields tab, then click Add scripted field. Enter a Name for the scripted field, then enter the Script you want to use to compute a value on the fly from your index data. Click Create field.

How do I write a script in Elasticsearch?

Wherever scripting is supported in the Elasticsearch APIs, the syntax follows the same pattern; you specify the language of your script, provide the script logic (or source), and add parameters that are passed into the script: "script": { "lang": "...", "source" | "id": "...", "params": { ... } }

What is painless script in Elasticsearch?

Painless is a simple, secure scripting language designed specifically for use with Elasticsearch. It is the default scripting language for Elasticsearch and can safely be used for inline and stored scripts.


1 Answers

With Elasticsearch painless scripting you can use code like the following:

POST ip_search/doc/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "originating_ip_calc": {
      "script": {
        "source": """
String ip_addr = params['_source']['originating_ip'];
def ip_chars = ip_addr.toCharArray();
int chars_len = ip_chars.length;
long result = 0;
int cur_power = 0;
int last_dot = chars_len;
for(int i = chars_len -1; i>=-1; i--) {
  if (i == -1 || ip_chars[i] == (char) '.' ){
    result += (Integer.parseInt(ip_addr.substring(i+ 1, last_dot)) * Math.pow(256, cur_power));
    last_dot = i;
    cur_power += 1;
  }
}         
return result
""",
        "lang": "painless"
      }
    }
  },
  "_source": ["originating_ip"]
}

(Note that I used Kibana console to send the request to ES, it does some escaping to make this a valid JSON before sending.)

This will give a response like this:

"hits": [
  {
    "_index": "ip_search",
    "_type": "doc",
    "_id": "2",
    "_score": 1,
    "_source": {
      "originating_ip": "10.0.0.1"
    },
    "fields": {
      "originating_ip_calc": [
        167772161
      ]
    }
  },
  {
    "_index": "ip_search",
    "_type": "doc",
    "_id": "1",
    "_score": 1,
    "_source": {
      "originating_ip": "1.2.3.4"
    },
    "fields": {
      "originating_ip_calc": [
        16909060
      ]
    }
  }
]

But why does it have to be this way?

Why does the approach with .split not work?

If you send the code from the question to ES it replies with an error like this:

      "script": "String[] ipAddressInArray = \"1.2.3.4\".split(\"\\\\.\");\n\nlong result = 0;\nfor (int i = 0; i < ipAddressInArray.length; i++) {\n    int power = 3 - i;\n    int ip = Integer.parseInt(ipAddressInArray[i]);\n    long longIP = (ip * Math.pow(256, power)).toLong();\n    result = result + longIP;\n}\nreturn result;",
      "lang": "painless",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Unknown call [split] with [1] arguments on type [String]."

This is mainly due to the fact that Java's String.split() is not considered safe to use (because it creates regex Pattern implicitly). They suggest to use Pattern#split but to do so you should have regexes enabled in your index.

By default, they are disabled:

      "script": "String[] ipAddressInArray = /\\./.split(\"1.2.3.4\");...
      "lang": "painless",
      "caused_by": {
        "type": "illegal_state_exception",
        "reason": "Regexes are disabled. Set [script.painless.regex.enabled] to [true] in elasticsearch.yaml to allow them. Be careful though, regexes break out of Painless's protection against deep recursion and long loops."

Why do we have to do an explicit cast (char) '.'?

So, we have to split the string on dots manually. The straightforward approach is to compare each char of the string with '.' (which in Java means char literal, not String).

But for painless it means String. So we have to make an explicit cast to char (because we are iterating over an array of chars).

Why do we have to work with char array directly?

Because apparently painless does not allow .length method of String as well:

    "reason": {
      "type": "script_exception",
      "reason": "compile error",
      "script_stack": [
        "\"1.2.3.4\".length",
        "         ^---- HERE"
      ],
      "script": "\"1.2.3.4\".length",
      "lang": "painless",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Unknown field [length] for type [String]."
      }
    }

So why is it called painless ?

Although I can't find any historical note on the naming after quick googling, from the documentation page and some experience (like above in this answer) I can infer that it is designed to be painless to use in production.

It's predecessor, Groovy, was a ticking bomb due to resources usage and security vulnerabilities. So Elasticsearch team created a very limited subset of Java/Groovy scripting which would have predictable performance and would not contain those security vulnerabilities, and called it painless.

If there is anything true about painless scripting language, is that it is limited and sandboxed.

like image 192
Nikolay Vasiliev Avatar answered Nov 30 '22 22:11

Nikolay Vasiliev