I'd like to use jq to do a left join on these two objects in an array by using the jq JOIN operator:
[
  {
    "type": "spreadsheet",
    "rows": [
      [
        "1",
        "Ben",
        "male"
      ],
      [
        "2",
        "Cathy",
        "female"
      ]
    ],
    "columns": [
      "id",
      "name",
      "sex"
    ]
  },
  {
    "type": "spreadsheet",
    "rows": [
      [
        "1",
        "25"
      ],
      [
        "2",
        "28"
      ]
    ],
    "columns": [
      "id",
      "age"
    ]
  }
]
to this:
{
    "type": "spreadsheet",
    "rows": [
      [
        "1",
        "Ben",
        "male",
        "25"
      ],
      [
        "2",
        "Cathy",
        "female",
        "28"
      ]
    ],
    "columns": [
      "id",
      "name",
      "sex",
      "age"
    ]
}
The jq manual references SQL-style operators like INDEX and JOIN but I can't find any examples online of how to use them. Like any join, I need to iterate over the rows in the first object and merge with rows found in the second object based on the id column.
The following adopts an object-oriented approach to the task, and ignores keys other than "rows" and "columns" in the input.
There are accordingly three steps:
toOO;left_join;toDB)The main program is thus quite short:
map(toOO) | left_join(.id) | toDB   
The helper functions, most of which are independently useful, are presented below but should appear before the main program above.
# headers should be an array of strings
def objectify(headers):
  . as $in
  | reduce range(0; headers|length) as $i ({};
      . + {(headers[$i]): $in[$i]} );
def fromObject:
  . as $in
  | reduce keys_unsorted[] as $k ( {rows:[], columns:[]};
     .columns += [$k] | .rows += [$in[$k]] );
# Input: an array of object
# Output: a single object with array-valued keys formed from
# the input object by (blindly) concatenating the values
def merge:
  def allkeys: map(keys) | add | unique;
  allkeys as $allkeys
  | reduce .[] as $in ({};
     reduce $allkeys[] as $k (.;
      . + {($k): (.[$k] + [$in[$k]]) } ));
# id should be a filter such as .id
# Apply merge but do not replicate id
def merge_by(id):
  merge | (id |= .[0] ) ;
# Create an object-representation of the columns/rows database
def toOO:
  .columns as $headers
  | [.rows[] | objectify($headers)];
def toDB:
  map(fromObject)
  | merge_by(.columns);
# Input: an array of two arrays, each consisting of objects forming a database.
# "key" specifies the (possibly composite) key to be used to form the
# left-join of the two databases. 
# That is, objects are NOT added to the first database.
def left_join(key):
  def updateObject(obj):
    reduce .[] as $x ([];
      if ($x|key) == (obj|key) 
      then . + [$x + obj] else . + [$x]
      end);
  reduce .[1][] as $x (.[0]; updateObject($x) );
{
  "columns": [
    "id",
    "name",
    "sex",
    "age"
  ],
  "rows": [
    [
      "1",
      "Ben",
      "male",
      "25"
    ],
    [
      "2",
      "Cathy",
      "female",
      "28"
    ]
  ]
}
JOIN builtinIf your jq has JOIN/4, then it can be used to implement left_join/1 as follows:
def left_join(key):
  map(INDEX(key)) 
  | [ JOIN( .[1]; .[0][]; key; add) ] ;
In theory, this should be more efficient than the definition given above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With