Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using jq to combine json files, getting file list length too long error

Tags:

json

linux

jq

Using jq to concat json files in a directory.

The directory contains a few hundred thousand files.

jq -s '.' *.json > output.json

returns an error that the file list is too long. Is there a way to write this that uses a method that will take in more files?

like image 514
Kristian Avatar asked Nov 23 '15 23:11

Kristian


People also ask

How do I combine large JSON files?

If you want to combine JSON files into a single file, you cannot just concatenate them since you, almost certainly, get a JSON syntax error. The only safe way to combine multiple files, is to read them into an array, which serializes to valid JSON.

Does jq use JSONPath?

JSONPath distinguishes between the "root object or element" ($) and "the current object or element" (.). jq simply uses . to refer to the current JSON entity and so it is context-dependent: it can refer to items in the input stream of the jq process as a whole, or to the output of a filter.

What is jq format?

jq is a free open source JSON processor that is flexible and straightforward to use. It allows users to display a JSON file using standard formatting, or to retrieve certain records or attribute-value pairs from it.

What jq command does?

The JQ command is used to transform JSON data into a more readable format and print it to the standard output on Linux. The JQ command is built around filters which are used to find and print only the required data from a JSON file.


2 Answers

If jq -s . *.json > output.json produces "argument list too long"; you could fix it using zargs in zsh:

$ zargs *.json -- cat | jq -s . > output.json

That you could emulate using find as shown in @chepner's answer:

$ find -maxdepth 1 -name \*.json -exec cat {} + | jq -s . > output.json

"Data in jq is represented as streams of JSON values ... This is a cat-friendly format - you can just join two JSON streams together and get a valid JSON stream.":

$ echo '{"a":1}{"b":2}' | jq -s .
[
  {
    "a": 1
  },
  {
    "b": 2
  }
]
like image 66
jfs Avatar answered Oct 23 '22 15:10

jfs


[EDITED to use find]

One obvious thing to consider would be to process one file at a time, and then "slurp" them:

$ while IFS= read -r f ; cat "$f" ; done <(find . -maxdepth 1 -name "*.json") | jq -s .

This however would presumably require a lot of memory. Thus the following may be closer to what you need:

#!/bin/bash
# "slurp" a bunch of files
# Requires a version of jq with 'inputs'.
echo "["
while read f
do
  jq -nr 'inputs | (., ",")' $f
done < <(find . -maxdepth 1 -name "*.json") | sed '$d'
echo "]"
like image 42
peak Avatar answered Oct 23 '22 13:10

peak