I have a field in a doc that is a string representation of ipv4 ("1.2.3.4"), the name of the field is "originating_ip". I'm trying to use the scripted fields using the painless language in order to add a new field (originating_ip_calc) to have the int (long) representation of said IPv4.
The following script works in groovy (and from what I understand this should basically work almost the same), but it seems like almost is not in this specific case.
String[] ipAddressInArray = "1.2.3.4".split("\\.");
long result = 0;
for (int i = 0; i < ipAddressInArray.length; i++) {
int power = 3 - i;
int ip = Integer.parseInt(ipAddressInArray[i]);
long longIP = (ip * Math.pow(256, power)).toLong();
result = result + longIP;
}
return result;
I also looking in this question and as you can see from the code above it is based on one of the answers there.
Also tried to work with InetAddress but no luck.
Select the data view you want to add a scripted field to. Select the Scripted fields tab, then click Add scripted field. Enter a Name for the scripted field, then enter the Script you want to use to compute a value on the fly from your index data. Click Create field.
Wherever scripting is supported in the Elasticsearch APIs, the syntax follows the same pattern; you specify the language of your script, provide the script logic (or source), and add parameters that are passed into the script: "script": { "lang": "...", "source" | "id": "...", "params": { ... } }
Painless is a simple, secure scripting language designed specifically for use with Elasticsearch. It is the default scripting language for Elasticsearch and can safely be used for inline and stored scripts.
With Elasticsearch painless scripting you can use code like the following:
POST ip_search/doc/_search
{
"query": {
"match_all": {}
},
"script_fields": {
"originating_ip_calc": {
"script": {
"source": """
String ip_addr = params['_source']['originating_ip'];
def ip_chars = ip_addr.toCharArray();
int chars_len = ip_chars.length;
long result = 0;
int cur_power = 0;
int last_dot = chars_len;
for(int i = chars_len -1; i>=-1; i--) {
if (i == -1 || ip_chars[i] == (char) '.' ){
result += (Integer.parseInt(ip_addr.substring(i+ 1, last_dot)) * Math.pow(256, cur_power));
last_dot = i;
cur_power += 1;
}
}
return result
""",
"lang": "painless"
}
}
},
"_source": ["originating_ip"]
}
(Note that I used Kibana console to send the request to ES, it does some escaping to make this a valid JSON before sending.)
This will give a response like this:
"hits": [
{
"_index": "ip_search",
"_type": "doc",
"_id": "2",
"_score": 1,
"_source": {
"originating_ip": "10.0.0.1"
},
"fields": {
"originating_ip_calc": [
167772161
]
}
},
{
"_index": "ip_search",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"originating_ip": "1.2.3.4"
},
"fields": {
"originating_ip_calc": [
16909060
]
}
}
]
But why does it have to be this way?
.split
not work?If you send the code from the question to ES it replies with an error like this:
"script": "String[] ipAddressInArray = \"1.2.3.4\".split(\"\\\\.\");\n\nlong result = 0;\nfor (int i = 0; i < ipAddressInArray.length; i++) {\n int power = 3 - i;\n int ip = Integer.parseInt(ipAddressInArray[i]);\n long longIP = (ip * Math.pow(256, power)).toLong();\n result = result + longIP;\n}\nreturn result;",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Unknown call [split] with [1] arguments on type [String]."
This is mainly due to the fact that Java's String.split()
is not considered safe to use (because it creates regex Pattern implicitly). They suggest to use Pattern#split but to do so you should have regexes enabled in your index.
By default, they are disabled:
"script": "String[] ipAddressInArray = /\\./.split(\"1.2.3.4\");...
"lang": "painless",
"caused_by": {
"type": "illegal_state_exception",
"reason": "Regexes are disabled. Set [script.painless.regex.enabled] to [true] in elasticsearch.yaml to allow them. Be careful though, regexes break out of Painless's protection against deep recursion and long loops."
(char) '.'
?So, we have to split the string on dots manually. The straightforward approach is to compare each char of the string with '.'
(which in Java means char
literal, not String
).
But for painless
it means String
. So we have to make an explicit cast to char
(because we are iterating over an array of chars).
Because apparently painless
does not allow .length
method of String
as well:
"reason": {
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"\"1.2.3.4\".length",
" ^---- HERE"
],
"script": "\"1.2.3.4\".length",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Unknown field [length] for type [String]."
}
}
painless
?Although I can't find any historical note on the naming after quick googling, from the documentation page and some experience (like above in this answer) I can infer that it is designed to be painless to use in production.
It's predecessor, Groovy, was a ticking bomb due to resources usage and security vulnerabilities. So Elasticsearch team created a very limited subset of Java/Groovy scripting which would have predictable performance and would not contain those security vulnerabilities, and called it painless
.
If there is anything true about painless
scripting language, is that it is limited and sandboxed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With