Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I debug a stdin error that jq is throwing at me?

given the following dummy data

[{
  "submitter": {
    "user_fields": {
      "region": "Colombia"
    }
  }
}, {
  "submitter": {
    "user_fields": {
      "region": "China"
    }
  }
}, {
  "submitter": {
    "user_fields": {
      "region": "China"
    }
  }
}, {
  "submitter": {
    "user_fields": {
      "region": "Mexico"
    }
  }
}, {
  "submitter": {
    "user_fields": {
      "region": "Canada"
    }
  }
}]

contained in

fulldata.json

i am trying to filter the objects with

"region": "China"

and then I want all the objects that satisfy the filter to populate a new file

chinadata.json

now this is what i did:

cat fulldata.json | jq 'select(.submitter.user_fields.region == "China")' > chinadata.json

and effectively a new

chinadata.json 

has been created in the process and it contains (it seems) all the correct info.

The problem, though, is that i have a bunch of lines throwing some errors of this kind while the previous command is ran:

jq: error (at <stdin>:45380): Cannot index number with string "user_fields"

Question: how can i use the information provided by the error thrown and inspect exactly the objects that caused the error? I would like to correct any formatting mistakes but the dataset is so big that i cannot literally scroll down.

Any idea would make me happy, thank you!

like image 580
UntilThen Avatar asked Dec 12 '17 02:12

UntilThen


1 Answers

  1. Rather than using cat (which is inefficient and hides the filename from jq), invoke jq with the filename as an argument.

  2. When I do this with your filter and sample data, I get the error message:

    jq: error (at fulldata.json:31): Cannot index array with string "submitter"

Here "31" is the line number, corresponding to the end of the file, that is, the end of the array. So the error message is saying: "you are trying to apply the index operation (.["submitter"]) to an array. Arrays can only be indexed by integers, so what's going on? Your query applies to objects, not to arrays.

  1. So a simple workaround is to wrap your query in a call to map(). This results in success.

  2. Another approach to debugging would be to use debug. You can sprinkle as many debugs in the query as you like. You could, for example, start with:

    select(debug | .submitter.user_fields.region == "China")

  3. Suppose now there is a spurious object in the array:

    { "submitter": 0 }

Running our map(select(...)) program, we get:

jq: error (at fulldata.json:35): Cannot index number with string "user_fields"

This is the error message you got, and it's pointing to the line where the error occurs.

  1. If you would rather just skip the anomalous records, consider using jq's (only) postfix ? operator, e.g.

    map( select(.submitter?.user_fields?.region? == "China") )

  2. If you want the index in the array where the problematic object is, then consider first adding an index, which can be done like so:

    range(0;length) as $i | [$i, .[$i]]

This converts the array into a stream of pairs of the form [i, object], where i is the index (starting at 0). You can then easily modify your query so that in case of an error, you can print the corresponding value of i. For example:

   range(0;length) as $i | [$i, .[$i]]
   | . as $pair
   | try (.[1] | select(.submitter.user_fields.region == "China"))
     catch ($pair[0] | error(tostring))

jq actually offers quite a lot of debugging support, including debug, try ... catch ..., and error(...) as already mentioned; these and some other goodies (e.g. input_line_number) are documented in the reference manual.

like image 139
peak Avatar answered Oct 31 '22 17:10

peak