I have a json object that looks like this (prodused by <code>i3-msg -t get_workspaces</code>. <pre class="prettyprint"><code>[ { "name": "1", "urgent": false }, { "name": "2", "urgent": false }, { "name": "something", "urgent": false } ] </code></pre> I am trying to use <code>jq</code> to figure out which index number in the list is based on a <code>select</code> query. <code>jq</code> have something called <code>index()</code>, but it seams to support only strings? Using something like <code>i3-msg -t get_workspaces | jq '.[] | select(.name=="something")'</code> gives me the object I want. But I want it's index. In this case <code>2</code> (starting counting at 0) Is this possible using <code>jq</code> alone?

So I provided a strategy for a solution to the OP, which OP quickly accepted. Subsequently @peak and @Jeff Mercado offered better and more complete solutions. So I have turned this into a community wiki. Please improve this answer if you can. A straightforward solution (pointed out by @peak) is to use the builtin function, <code>index</code>: <pre class="prettyprint"><code>map(.name == "something") | index(true) </code></pre> The <code>jq</code> documentation confusingly suggests that <code>index</code> operates on strings, but it operates on arrays as well. Thus <code>index(true)</code> returns the index of the first <code>true</code> in the array of booleans produced by the map. If there is no item satisfying the condition, the result is null. jq expresions are evaluated in a "lazy" manner, but <code>map</code> will traverse the entire input array. We can verify this by rewriting the above code and introducing some debug statements: <pre class="prettyprint"><code>[ .[] | debug | .name == "something" ] | index(true) </code></pre> As suggested by @peak, the key to doing better is to use the <code>break</code> statement introduced in jq 1.5: <pre class="prettyprint"><code>label $out | foreach .[] as $item ( -1; .+1; if $item.name == "something" then ., break $out else empty end ) // null </code></pre> Note that the <code>//</code> is no comment; it is the alternative operator. If the name is not found the <code>foreach</code> will return <code>empty</code> which will be converted to null by the alternative operator. Another approach is to recursively process the array: <pre class="prettyprint"><code>def get_index(name): name as $name | if (. == []) then null elif (.[0].name == $name) then 0 else (.[1:] | get_index($name)) as $result | if ($result == null) then null else $result+1 end end; get_index("something") </code></pre> However this recursive implementation will use stack space proportional to the length of the array in the worst case as pointed out by @Jeff Mercado. In version 1.5 <code>jq</code> introduced Tail Call Optimization (TCO) which will allow us to optimize this away using a local helper function (note that this is minor adaptation to a solution provided by @Jeff Mercado so as to be consistent with the above examples): <pre class="prettyprint"><code>def get_index(name): name as $name | def _get_index: if (.i >= .len) then null elif (.array[.i].name == $name) then .i else .i += 1 | _get_index end; { array: ., i: 0, len: length } | _get_index; get_index("something") </code></pre> According to @peak obtaining the length of an array in <code>jq</code> is a constant time operation, and apparently indexing an array is inexpensive as well. I will try to find a citation for this. Now let's try to actually measure. Here is an example of measuring the simple solution: <pre class="prettyprint"><code>#!/bin/bash jq -n ' def get_index(name): name as $name | map(.name == $name) | index(true) ; def gen_input(n): n as $n | if ($n == 0) then [] else gen_input($n-1) + [ { "name": $n, "urgent":false } ] end ; 2000 as $n | gen_input($n) as $i | [(0 | while (.<$n; [ ($i | get_index(.)), .+1 ][1]))][$n-1] ' </code></pre> When I run this on my machine, I get the following: <pre class="prettyprint"><code>$ time ./simple 1999 real 0m10.024s user 0m10.023s sys 0m0.008s </code></pre> If I replace this with the "fast" version of get_index: <pre class="prettyprint"><code>def get_index(name): name as $name | label $out | foreach .[] as $item ( -1; .+1; if $item.name == $name then ., break $out else empty end ) // null; </code></pre> Then I get: <pre class="prettyprint"><code>$ time ./fast 1999 real 0m13.165s user 0m13.173s sys 0m0.000s </code></pre> And if I replace it with the "fast" recursive version: <pre class="prettyprint"><code>def get_index(name): name as $name | def _get_index: if (.i >= .len) then null elif (.array[.i].name == $name) then .i else .i += 1 | _get_index end; { array: ., i: 0, len: length } | _get_index; </code></pre> I get: <pre class="prettyprint"><code>$ time ./fast-recursive 1999 real 0m52.628s user 0m52.657s sys 0m0.005s </code></pre> Ouch! But we can do better. @peak mentioned an undocumented switch <code>--debug-dump-disasm</code> which lets you see how <code>jq</code> is compiling your code. With this you can see that modifying and passing the object to <code>_indexof</code> and then extracting the array, length, and index is expensive. Refactoring to just pass the index is a huge improvement, and a further refinement to avoid testing the index against the length makes it competitive with the iterative version: <pre class="prettyprint"><code>def indexof($name): (.+[{name: $name}]) as $a | # add a "sentinel" length as $l | # note length sees original array def _indexof: if ($a[.].name == $name) then if (. != $l) then . else null end else .+1 | _indexof end ; 0 | _indexof ; </code></pre> I get: <pre class="prettyprint"><code>$ time ./fast-recursive2 null real 0m13.238s user 0m13.243s sys 0m0.005s </code></pre> So it appears that if each element is equally likely, and you want an average case performance, you should stick with the simple implementation. (C-coded functions tend to be fast!)

Getting the object array index in jq

Tags:

json

arrays

indexof

jq

I have a json object that looks like this (prodused by i3-msg -t get_workspaces.

[
  {
    "name": "1",
    "urgent": false
  },
  {
    "name": "2",
    "urgent": false
  },
  {
    "name": "something",
    "urgent": false
  }
]

I am trying to use jq to figure out which index number in the list is based on a select query. jq have something called index(), but it seams to support only strings?

Using something like i3-msg -t get_workspaces | jq '.[] | select(.name=="something")' gives me the object I want. But I want it's index. In this case 2 (starting counting at 0)

Is this possible using jq alone?

771

asked Jan 31 '17 13:01

xeor

1 Answers

So I provided a strategy for a solution to the OP, which OP quickly accepted. Subsequently @peak and @Jeff Mercado offered better and more complete solutions. So I have turned this into a community wiki. Please improve this answer if you can.

A straightforward solution (pointed out by @peak) is to use the builtin function, index:

map(.name == "something") | index(true)

The jq documentation confusingly suggests that index operates on strings, but it operates on arrays as well. Thus index(true) returns the index of the first true in the array of booleans produced by the map. If there is no item satisfying the condition, the result is null.

jq expresions are evaluated in a "lazy" manner, but map will traverse the entire input array. We can verify this by rewriting the above code and introducing some debug statements:

[ .[] | debug | .name == "something" ] | index(true)

As suggested by @peak, the key to doing better is to use the break statement introduced in jq 1.5:

label $out | 
foreach .[] as $item (
  -1; 
  .+1; 
  if $item.name == "something" then 
    ., 
    break $out 
  else 
    empty
  end
) // null

Note that the // is no comment; it is the alternative operator. If the name is not found the foreach will return empty which will be converted to null by the alternative operator.

Another approach is to recursively process the array:

def get_index(name): 
  name as $name | 
  if (. == []) then
    null
  elif (.[0].name == $name) then 
    0 
  else 
    (.[1:] | get_index($name)) as $result |
    if ($result == null) then null else $result+1 end      
end;
get_index("something")

However this recursive implementation will use stack space proportional to the length of the array in the worst case as pointed out by @Jeff Mercado. In version 1.5 jq introduced Tail Call Optimization (TCO) which will allow us to optimize this away using a local helper function (note that this is minor adaptation to a solution provided by @Jeff Mercado so as to be consistent with the above examples):

def get_index(name): 
  name as $name | 
  def _get_index:
    if (.i >= .len) then
      null
    elif (.array[.i].name == $name) then
      .i
    else
      .i += 1 | _get_index
    end;
  { array: ., i: 0, len: length } | _get_index;
get_index("something")

According to @peak obtaining the length of an array in jq is a constant time operation, and apparently indexing an array is inexpensive as well. I will try to find a citation for this.

Now let's try to actually measure. Here is an example of measuring the simple solution:

#!/bin/bash

jq -n ' 

  def get_index(name): 
    name as $name |
    map(.name == $name) | index(true)
  ;

  def gen_input(n):  
    n as $n |
    if ($n == 0) then 
      []
    else
      gen_input($n-1) + [ { "name": $n, "urgent":false } ]
    end
  ;  

  2000 as $n |
  gen_input($n) as $i |
  [(0 | while (.<$n; [ ($i | get_index(.)), .+1 ][1]))][$n-1]
'

When I run this on my machine, I get the following:

$ time ./simple
1999

real    0m10.024s
user    0m10.023s
sys     0m0.008s

If I replace this with the "fast" version of get_index:

def get_index(name): 
  name as $name |
  label $out | 
  foreach .[] as $item (
    -1; 
    .+1; 
  if $item.name == $name then 
    ., 
    break $out 
  else 
    empty
  end
) // null;

Then I get:

$ time ./fast
1999

real    0m13.165s
user    0m13.173s
sys     0m0.000s

And if I replace it with the "fast" recursive version:

def get_index(name): 
  name as $name | 
  def _get_index:
    if (.i >= .len) then
      null
    elif (.array[.i].name == $name) then
      .i
    else
      .i += 1 | _get_index
    end;
  { array: ., i: 0, len: length } | _get_index;

I get:

$ time ./fast-recursive 
1999

real    0m52.628s
user    0m52.657s
sys     0m0.005s

Ouch! But we can do better. @peak mentioned an undocumented switch --debug-dump-disasm which lets you see how jq is compiling your code. With this you can see that modifying and passing the object to _indexof and then extracting the array, length, and index is expensive. Refactoring to just pass the index is a huge improvement, and a further refinement to avoid testing the index against the length makes it competitive with the iterative version:

def indexof($name):
  (.+[{name: $name}]) as $a | # add a "sentinel"
  length as $l | # note length sees original array
  def _indexof:
    if ($a[.].name == $name) then
      if (. != $l) then . else null end
    else
      .+1 | _indexof
    end
  ;


  0 | _indexof
;

I get:

$ time ./fast-recursive2
null

real    0m13.238s
user    0m13.243s
sys     0m0.005s

So it appears that if each element is equally likely, and you want an average case performance, you should stick with the simple implementation. (C-coded functions tend to be fast!)

192

answered Sep 19 '22 14:09

13 revs, 2 users 99%

Related questions
                            
                                How can I slice each element of a numpy array of strings?
                            
                                MongoDB unwind multiple arrays
                            
                                delphi array of string stringlist conversion
                            
                                Index expression must be constant - WebGL/GLSL error
                            
                                Asynchronous access to an array in Firebase
                            
                                Why do strings behave like an array in PHP 5.3?
                            
                                Copy Arrays to Array
                            
                                Initializing "a pointer to an array of integers"
                            
                                Elasticsearch - Filter where (one of nested array) and (all of nested array)
                            
                                PHP access all $_POST[] variables into an array?
                            
                                Why push method is significantly slower than putting values via array indices in Javascript
                            
                                How can I "zip" two arrays in PowerShell?
                            
                                Fill Bounding Boxes in 2D array
                            
                                VBA variable/array default values
                            
                                Array Reverse is not working for me ...
                            
                                How to merge duplicates in an array of objects and sum a specific property? [duplicate]
                            
                                Efficient data structure for fast random access, search, insertion and deletion
                            
                                Subsetting data in Python
                            
                                Search array and return all keys and values when match found
                            
                                Array index from first to second last in PowerShell

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With