Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert JSON lines to JSON array using jq

Tags:

Firstly, I'm new to jq, like 1 day new, I'm also new to JSON, I'm an SQL guy so I'm learning fast but can't get my head around this ... so please bear with me.

I'm running Windows, using jq v1.5 on PowerShell.

I have multiple JSON files downloaded and they look like this:

{"Header":{"AssetClass":"Commodities","InstrumentType":"Forward","UseCase":"Forward","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"EUR","ExpiryDate":"2018-01-01","ReturnorPayoutTrigger":"Contract for Difference (CFD)","DeliveryType":"CASH","BaseProduct":"AGRI","TransactionType":"FUTR","FinalPriceType":"ARGM","ReferenceRate":"10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN","SubProduct":"GROS","AdditionalSubProduct":"FWHT"},"ISIN":{"ISIN":"EZX27M86B860","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","UnderlyingAssetType":"Agriculture","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"FullName":"Commodities Forward AGRI GROS FWHT EUR 20180101","ShortName":"NA/Fwd AGRI FWHT EUR 20180101","ClassificationType":"JTAXCC"}}
{"Header":{"AssetClass":"Commodities","InstrumentType":"Swap","UseCase":"Basis_Swap","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"SOS","ExpiryDate":"2208-12-10","ReturnorPayoutTrigger":"Total Return","DeliveryType":"OPTL","TransactionType":"ORIT","FinalPriceType":"IHSM","ReferenceRate":"NATURAL GAS-MONTHLY INDEX S. TEXAS (TETCO)-GAS DAILY PRICE GUIDE","OtherReferenceRate":"NATURAL GAS-MONTHLY INDEX W. LOUISIANA (TETCO)-GAS DAILY PRICE GUIDE","BaseProduct":"OTHR","OtherBaseProduct":"OTHR","SubProduct":"","AdditionalSubProduct":"","OtherSubProduct":"","OtherAdditionalSubProduct":""},"ISIN":{"ISIN":"EZBBH1XR9GV6","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","UnderlyingAssetType":"Multi Commodity","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"FullName":"Commodities Swap Basis_Swap OTHR   OTHR   SOS 22081210","ShortName":"NA/Swap OTHR   SOS 22081210","ClassificationType":"STQTXE"}}
{"Header":{"AssetClass":"Commodities","InstrumentType":"Swap","UseCase":"Multi_Exotic_Swap","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"LRD","ExpiryDate":"2200-01-31","ReturnorPayoutTrigger":"Contract for Difference (CFD)","DeliveryType":"CASH","TransactionType":"TAPO","FinalPriceType":"EXOF","UnderlyingInstrumentIndex":["BCOMF6","BCOMNG3"]},"ISIN":{"ISIN":"EZ286HJVY4Q2","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"UnderlyingAssetType":"Multi Commodity","BaseProduct":"MCEX","SubProduct":"","AdditionalSubProduct":"","FullName":"Commodities Multi_Exotic_Swap MCEX LRD 22000131","ShortName":"NA/Swap MCEX LRD 22000131","ClassificationType":"STQCXC"}}
{"Header":{"AssetClass":"Commodities","InstrumentType":"Option","UseCase":"Option","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"TND","ExpiryDate":"2209-10-18","OptionType":"OPTL","OptionExerciseStyle":"AMER","ValuationMethodorTrigger":"Asian","DeliveryType":"CASH","TransactionType":"OTHR","FinalPriceType":"IHSM","ReferenceRate":"NATURAL GAS-NGPL (NICOR, NIPSCO, PGLC CITYGATE), NBPL-NICOR-ICE/10X MONTHLY","BaseProduct":"OTHR","SubProduct":"","AdditionalSubProduct":""},"ISIN":{"ISIN":"EZ2TK5CWL9Y4","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","UnderlyingAssetType":"Other","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"FullName":"Commodities Option OTHR   TND 22091018","ShortName":"NA/O OTHR  OPTL TND 22091018","ClassificationType":"HTMHAC"}}
{"Header":{"AssetClass":"Commodities","InstrumentType":"Option","UseCase":"Multi_Exotic_Option","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"SOS","ExpiryDate":"2209-10-18","UnderlyingInstrumentIndex":["BCOMSI2","BCOMPR3T"],"OptionType":"CALL","OptionExerciseStyle":"AMER","ValuationMethodorTrigger":"Other Path Dependent","DeliveryType":"CASH","TransactionType":"ORIT","FinalPriceType":"BLTC"},"ISIN":{"ISIN":"EZ82L36B6225","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"UnderlyingAssetType":"Multi Commodity","BaseProduct":"MCEX","SubProduct":"","AdditionalSubProduct":"","FullName":"Commodities Multi_Exotic_Option MCEX SOS 22091018","ShortName":"NA/O MCEX Call SOS 22091018","ClassificationType":"HTQBPC"}}

The files can be upwards of 1 GB in size.

For me to use these effectively, I need to turn the JSON lines into JSON Arrays, wrapping the file, prepending "[" and appending "]" and for each line separate with a comma (,).

Making the file look like this (in theory):

[
{
"Header": {
  "AssetClass": "Commodities",
  "InstrumentType": "Swap",
  "UseCase": "Basis_Swap",
  "Level": "InstRefDataReporting"
},
"Attributes": {
  "NotionalCurrency": "EUR",
  "ExpiryDate": "2017-08-31",
  "ReturnorPayoutTrigger": "Contract for Difference (CFD)",
  "DeliveryType": "CASH",
  "BaseProduct": "AGRI",
  "OtherBaseProduct": "AGRI",
  "TransactionType": "FUTR",
  "FinalPriceType": "ARGM",
  "ReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "OtherReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "SubProduct": "GROS",
  "AdditionalSubProduct": "FWHT",
  "OtherSubProduct": "GROS",
  "OtherAdditionalSubProduct": "FWHT"
},
"ISIN": {
  "ISIN": "EZ68CZDRFYY7",
  "Status": "New"
},
"TemplateVersion": 1,
"Derived": {
  "CommodityDerivativeIndicator": "TRUE",
  "UnderlyingAssetType": "Multi Commodity",
  "IssuerorOperatoroftheTradingVenueIdentifier": "NA",
  "PriceMultiplier": 1,
  "FullName": "Commodities Swap Basis_Swap AGRI GROS FWHT AGRI GROS FWHT EUR 20170831",
  "ShortName": "NA/Swap AGRI FWHT FWHT EUR 20170831",
  "ClassificationType": "STQCXC"
  }
},
{
"Header": {
  "AssetClass": "Commodities",
  "InstrumentType": "Swap",
  "UseCase": "Basis_Swap",
  "Level": "InstRefDataReporting"
},
"Attributes": {
  "NotionalCurrency": "EUR",
  "ExpiryDate": "2017-08-31",
  "ReturnorPayoutTrigger": "Contract for Difference (CFD)",
  "DeliveryType": "CASH",
  "BaseProduct": "AGRI",
  "OtherBaseProduct": "AGRI",
  "TransactionType": "FUTR",
  "FinalPriceType": "ARGM",
  "ReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "OtherReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "SubProduct": "GROS",
  "AdditionalSubProduct": "FWHT",
  "OtherSubProduct": "GROS",
  "OtherAdditionalSubProduct": "FWHT"
},
"ISIN": {
  "ISIN": "EZ68CZDRFYY7",
  "Status": "New"
},
"TemplateVersion": 1,
"Derived": {
  "CommodityDerivativeIndicator": "TRUE",
  "UnderlyingAssetType": "Multi Commodity",
  "IssuerorOperatoroftheTradingVenueIdentifier": "NA",
  "PriceMultiplier": 1,
  "FullName": "Commodities Swap Basis_Swap AGRI GROS FWHT AGRI GROS FWHT EUR 20170831",
  "ShortName": "NA/Swap AGRI FWHT FWHT EUR 20170831",
  "ClassificationType": "STQCXC"
}
}
]

So I've found jq and from my understanding i can run this

jq --slurp 'map(select(. >= 2))' Inputfile.json > OutputFile.json

This works but when running files larger than 200 MB i get "system out out of memory" error (if i use ISE) and if i use standard Powershell or CMD it takes ages. minutes (5+)

If i take --slurp out of the command, it works, it's quicker but the result looks like this:

[
{
"Header": {
  "AssetClass": "Commodities",
  "InstrumentType": "Swap",
  "UseCase": "Basis_Swap",
  "Level": "InstRefDataReporting"
},
"Attributes": {
  "NotionalCurrency": "EUR",
  "ExpiryDate": "2017-08-31",
  "ReturnorPayoutTrigger": "Contract for Difference (CFD)",
  "DeliveryType": "CASH",
  "BaseProduct": "AGRI",
  "OtherBaseProduct": "AGRI",
  "TransactionType": "FUTR",
  "FinalPriceType": "ARGM",
  "ReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "OtherReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "SubProduct": "GROS",
  "AdditionalSubProduct": "FWHT",
  "OtherSubProduct": "GROS",
  "OtherAdditionalSubProduct": "FWHT"
},
"ISIN": {
  "ISIN": "EZ68CZDRFYY7",
  "Status": "New"
},
"TemplateVersion": 1,
"Derived": {
  "CommodityDerivativeIndicator": "TRUE",
  "UnderlyingAssetType": "Multi Commodity",
  "IssuerorOperatoroftheTradingVenueIdentifier": "NA",
  "PriceMultiplier": 1,
  "FullName": "Commodities Swap Basis_Swap AGRI GROS FWHT AGRI GROS FWHT EUR 20170831",
  "ShortName": "NA/Swap AGRI FWHT FWHT EUR 20170831",
  "ClassificationType": "STQCXC"
  }
}]
[{
"Header": {
  "AssetClass": "Commodities",
  "InstrumentType": "Swap",
  "UseCase": "Basis_Swap",
  "Level": "InstRefDataReporting"
},
"Attributes": {
  "NotionalCurrency": "EUR",
  "ExpiryDate": "2017-08-31",
  "ReturnorPayoutTrigger": "Contract for Difference (CFD)",
  "DeliveryType": "CASH",
  "BaseProduct": "AGRI",
  "OtherBaseProduct": "AGRI",
  "TransactionType": "FUTR",
  "FinalPriceType": "ARGM",
  "ReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "OtherReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "SubProduct": "GROS",
  "AdditionalSubProduct": "FWHT",
  "OtherSubProduct": "GROS",
  "OtherAdditionalSubProduct": "FWHT"
},
"ISIN": {
  "ISIN": "EZ68CZDRFYY7",
  "Status": "New"
},
"TemplateVersion": 1,
"Derived": {
  "CommodityDerivativeIndicator": "TRUE",
  "UnderlyingAssetType": "Multi Commodity",
  "IssuerorOperatoroftheTradingVenueIdentifier": "NA",
  "PriceMultiplier": 1,
  "FullName": "Commodities Swap Basis_Swap AGRI GROS FWHT AGRI GROS FWHT EUR 20170831",
  "ShortName": "NA/Swap AGRI FWHT FWHT EUR 20170831",
  "ClassificationType": "STQCXC"
}
}
]

It creates an array for each line but the array is not comma separated, and thats not what I wanted.

So, how can I process large files, of multiple JSON lines, and create one file for the input file that is generated as a single array, comma separated, without using slurp?

I've read about inputs but not sure if that is relevant to what I need to do?

like image 292
JayBay2279 Avatar asked Aug 02 '17 09:08

JayBay2279


3 Answers

Maybe, I am late but this is what you are looking for!

jq -s '.' in.json > out.json
like image 192
Shahbaz Ali Avatar answered Sep 25 '22 02:09

Shahbaz Ali


It looks like inputs suffers from the same issue as slurp. I don't know how to accomplish this with jq on huge files, but sed can do it:

sed '1s/^/[/; $!s/$/,/; $s/$/]/' in.json > out.json

Output:

[{"Header":{"AssetClass":"Commodities","InstrumentType":"Forward","UseCase":"Forward","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"EUR","ExpiryDate":"2018-01-01","ReturnorPayoutTrigger":"Contract for Difference (CFD)","DeliveryType":"CASH","BaseProduct":"AGRI","TransactionType":"FUTR","FinalPriceType":"ARGM","ReferenceRate":"10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN","SubProduct":"GROS","AdditionalSubProduct":"FWHT"},"ISIN":{"ISIN":"EZX27M86B860","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","UnderlyingAssetType":"Agriculture","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"FullName":"Commodities Forward AGRI GROS FWHT EUR 20180101","ShortName":"NA/Fwd AGRI FWHT EUR 20180101","ClassificationType":"JTAXCC"}},
{"Header":{"AssetClass":"Commodities","InstrumentType":"Swap","UseCase":"Basis_Swap","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"SOS","ExpiryDate":"2208-12-10","ReturnorPayoutTrigger":"Total Return","DeliveryType":"OPTL","TransactionType":"ORIT","FinalPriceType":"IHSM","ReferenceRate":"NATURAL GAS-MONTHLY INDEX S. TEXAS (TETCO)-GAS DAILY PRICE GUIDE","OtherReferenceRate":"NATURAL GAS-MONTHLY INDEX W. LOUISIANA (TETCO)-GAS DAILY PRICE GUIDE","BaseProduct":"OTHR","OtherBaseProduct":"OTHR","SubProduct":"","AdditionalSubProduct":"","OtherSubProduct":"","OtherAdditionalSubProduct":""},"ISIN":{"ISIN":"EZBBH1XR9GV6","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","UnderlyingAssetType":"Multi Commodity","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"FullName":"Commodities Swap Basis_Swap OTHR   OTHR   SOS 22081210","ShortName":"NA/Swap OTHR   SOS 22081210","ClassificationType":"STQTXE"}},
{"Header":{"AssetClass":"Commodities","InstrumentType":"Swap","UseCase":"Multi_Exotic_Swap","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"LRD","ExpiryDate":"2200-01-31","ReturnorPayoutTrigger":"Contract for Difference (CFD)","DeliveryType":"CASH","TransactionType":"TAPO","FinalPriceType":"EXOF","UnderlyingInstrumentIndex":["BCOMF6","BCOMNG3"]},"ISIN":{"ISIN":"EZ286HJVY4Q2","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"UnderlyingAssetType":"Multi Commodity","BaseProduct":"MCEX","SubProduct":"","AdditionalSubProduct":"","FullName":"Commodities Multi_Exotic_Swap MCEX LRD 22000131","ShortName":"NA/Swap MCEX LRD 22000131","ClassificationType":"STQCXC"}},
{"Header":{"AssetClass":"Commodities","InstrumentType":"Option","UseCase":"Option","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"TND","ExpiryDate":"2209-10-18","OptionType":"OPTL","OptionExerciseStyle":"AMER","ValuationMethodorTrigger":"Asian","DeliveryType":"CASH","TransactionType":"OTHR","FinalPriceType":"IHSM","ReferenceRate":"NATURAL GAS-NGPL (NICOR, NIPSCO, PGLC CITYGATE), NBPL-NICOR-ICE/10X MONTHLY","BaseProduct":"OTHR","SubProduct":"","AdditionalSubProduct":""},"ISIN":{"ISIN":"EZ2TK5CWL9Y4","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","UnderlyingAssetType":"Other","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"FullName":"Commodities Option OTHR   TND 22091018","ShortName":"NA/O OTHR  OPTL TND 22091018","ClassificationType":"HTMHAC"}},
{"Header":{"AssetClass":"Commodities","InstrumentType":"Option","UseCase":"Multi_Exotic_Option","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"SOS","ExpiryDate":"2209-10-18","UnderlyingInstrumentIndex":["BCOMSI2","BCOMPR3T"],"OptionType":"CALL","OptionExerciseStyle":"AMER","ValuationMethodorTrigger":"Other Path Dependent","DeliveryType":"CASH","TransactionType":"ORIT","FinalPriceType":"BLTC"},"ISIN":{"ISIN":"EZ82L36B6225","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"UnderlyingAssetType":"Multi Commodity","BaseProduct":"MCEX","SubProduct":"","AdditionalSubProduct":"","FullName":"Commodities Multi_Exotic_Option MCEX SOS 22091018","ShortName":"NA/O MCEX Call SOS 22091018","ClassificationType":"HTQBPC"}}]

Explanation

The sed script consists of three separate substitutions. Here they are on separate lines:

1  s/^/[/      # Insert a left bracket at the beginning of the first line
$! s/$/,/      # On all but the last line append a comma
$  s/$/]/      # Append a right bracket to the last line
like image 37
Thor Avatar answered Sep 22 '22 02:09

Thor


Maybe awk is easier on the eyes:

awk 'BEGIN{print "["} 
 length(last)>0 {print last ","} {last=$0}
 END {print last, "]"}'

And for the record, here's a slurp-free solution using a version of jq with input:

jq -nr '"[", try (input|tojson, repeat(",\n\(input|tojson)")), "]"'
like image 42
peak Avatar answered Sep 26 '22 02:09

peak