I have a json object that looks like this (prodused by i3-msg -t get_workspaces
.
[
{
"name": "1",
"urgent": false
},
{
"name": "2",
"urgent": false
},
{
"name": "something",
"urgent": false
}
]
I am trying to use jq
to figure out which index number in the list is based on a select
query. jq
have something called index()
, but it seams to support only strings?
Using something like i3-msg -t get_workspaces | jq '.[] | select(.name=="something")'
gives me the object I want. But I want it's index. In this case 2
(starting counting at 0)
Is this possible using jq
alone?
To find the index of an object in an array, by a specific property: Use the map() method to iterate over the array, returning only the value of the relevant property. Call the indexOf() method on the returned from map array. The indexOf method returns the index of the first occurrence of a value in an array.
A jq program is a "filter": it takes an input, and produces an output. There are a lot of builtin filters for extracting a particular field of an object, or converting a number to a string, or various other standard tasks.
jq filters run on a stream of JSON data. The input to jq is parsed as a sequence of whitespace-separated JSON values which are passed through the provided filter one at a time. The output(s) of the filter are written to standard out, again as a sequence of whitespace-separated JSON data.
So I provided a strategy for a solution to the OP, which OP quickly accepted. Subsequently @peak and @Jeff Mercado offered better and more complete solutions. So I have turned this into a community wiki. Please improve this answer if you can.
A straightforward solution (pointed out by @peak) is to use the builtin function, index
:
map(.name == "something") | index(true)
The jq
documentation confusingly suggests that index
operates on strings, but it operates on arrays as well. Thus index(true)
returns the index of the first true
in the array of booleans produced by the map. If there is no item satisfying the condition, the result is null.
jq expresions are evaluated in a "lazy" manner, but map
will traverse the entire input array. We can verify this by rewriting the above code and introducing some debug statements:
[ .[] | debug | .name == "something" ] | index(true)
As suggested by @peak, the key to doing better is to use the break
statement introduced in jq 1.5:
label $out |
foreach .[] as $item (
-1;
.+1;
if $item.name == "something" then
.,
break $out
else
empty
end
) // null
Note that the //
is no comment; it is the alternative operator. If the name is not found the foreach
will return empty
which will be converted to null by the alternative operator.
Another approach is to recursively process the array:
def get_index(name):
name as $name |
if (. == []) then
null
elif (.[0].name == $name) then
0
else
(.[1:] | get_index($name)) as $result |
if ($result == null) then null else $result+1 end
end;
get_index("something")
However this recursive implementation will use stack space proportional to the length of the array in the worst case as pointed out by @Jeff Mercado. In version 1.5 jq
introduced Tail Call Optimization (TCO) which will allow us to optimize this away using a local helper function (note that this is minor adaptation to a solution provided by @Jeff Mercado so as to be consistent with the above examples):
def get_index(name):
name as $name |
def _get_index:
if (.i >= .len) then
null
elif (.array[.i].name == $name) then
.i
else
.i += 1 | _get_index
end;
{ array: ., i: 0, len: length } | _get_index;
get_index("something")
According to @peak obtaining the length of an array in jq
is a constant time operation, and apparently indexing an array is inexpensive as well. I will try to find a citation for this.
Now let's try to actually measure. Here is an example of measuring the simple solution:
#!/bin/bash
jq -n '
def get_index(name):
name as $name |
map(.name == $name) | index(true)
;
def gen_input(n):
n as $n |
if ($n == 0) then
[]
else
gen_input($n-1) + [ { "name": $n, "urgent":false } ]
end
;
2000 as $n |
gen_input($n) as $i |
[(0 | while (.<$n; [ ($i | get_index(.)), .+1 ][1]))][$n-1]
'
When I run this on my machine, I get the following:
$ time ./simple
1999
real 0m10.024s
user 0m10.023s
sys 0m0.008s
If I replace this with the "fast" version of get_index:
def get_index(name):
name as $name |
label $out |
foreach .[] as $item (
-1;
.+1;
if $item.name == $name then
.,
break $out
else
empty
end
) // null;
Then I get:
$ time ./fast
1999
real 0m13.165s
user 0m13.173s
sys 0m0.000s
And if I replace it with the "fast" recursive version:
def get_index(name):
name as $name |
def _get_index:
if (.i >= .len) then
null
elif (.array[.i].name == $name) then
.i
else
.i += 1 | _get_index
end;
{ array: ., i: 0, len: length } | _get_index;
I get:
$ time ./fast-recursive
1999
real 0m52.628s
user 0m52.657s
sys 0m0.005s
Ouch! But we can do better. @peak mentioned an undocumented switch --debug-dump-disasm
which lets you see how jq
is compiling your code. With this you can see that modifying and passing the object to _indexof
and then extracting the array, length, and index is expensive. Refactoring to just pass the index is a huge improvement, and a further refinement to avoid testing the index against the length makes it competitive with the iterative version:
def indexof($name):
(.+[{name: $name}]) as $a | # add a "sentinel"
length as $l | # note length sees original array
def _indexof:
if ($a[.].name == $name) then
if (. != $l) then . else null end
else
.+1 | _indexof
end
;
0 | _indexof
;
I get:
$ time ./fast-recursive2
null
real 0m13.238s
user 0m13.243s
sys 0m0.005s
So it appears that if each element is equally likely, and you want an average case performance, you should stick with the simple implementation. (C-coded functions tend to be fast!)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With