I am trying to convert JSON to CSV file, that I can use for further analysis. Issue with my structure is that I have quite some nested dict/lists when I convert my JSON file.
I tried to use pandas json_normalize()
, but it only flattens first level.
import json import pandas as pd from pandas.io.json import json_normalize from cs import CloudStack api_key = xxxx secret = xxxx endpoint = xxxx cs = CloudStack(endpoint=endpoint, key=api_key, secret=secret) virtual_machines = cs.virtMach() test = json_normalize(virtual_machines["virtualmachine"]) test.to_csv("test.csv", sep="|", index=False)
Any idea how to flatter whole JSON file, so I can create single line input to CSV file for single (in this case virtual machine) entry? I have tried couple of solutions posted here, but my result was always only first level was flattened.
This is sample JSON (in this case, I still get "securitygroup" and "nic" output as JSON format:
{ "count": 13, "virtualmachine": [ { "id": "1082e2ed-ff66-40b1-a41b-26061afd4a0b", "name": "test-2", "displayname": "test-2", "securitygroup": [ { "id": "9e649fbc-3e64-4395-9629-5e1215b34e58", "name": "test", "tags": [] } ], "nic": [ { "id": "79568b14-b377-4d4f-b024-87dc22492b8e", "networkid": "05c0e278-7ab4-4a6d-aa9c-3158620b6471" }, { "id": "3d7f2818-1f19-46e7-aa98-956526c5b1ad", "networkid": "b4648cfd-0795-43fc-9e50-6ee9ddefc5bd" "traffictype": "Guest" } ], "hypervisor": "KVM", "affinitygroup": [], "isdynamicallyscalable": false } ] }
There are many ways to flatten JSON. There is one recursive way and another by using the json-flatten library. Now we can flatten the dictionary array by a recursive approach which is quite easy to understand. The recursive approach is a bit slower than using the json-flatten library.
Use the Flatten JSON Objects extension to convert a nested data layer object into a new object with only one layer of key/value pairs. For more advanced options in converting a data layer, see the Data Layer Converter.
I used the following function (details can be found here):
def flatten_data(y): out = {} def flatten(x, name=''): if type(x) is dict: for a in x: flatten(x[a], name + a + '_') elif type(x) is list: i = 0 for a in x: flatten(a, name + str(i) + '_') i += 1 else: out[name[:-1]] = x flatten(y) return out
This unfortunately completely flattens whole JSON, meaning that if you have multi-level JSON (many nested dictionaries), it might flatten everything into single line with tons of columns.
What I used, in the end, was json_normalize()
and specified structure that I required. A nice example of how to do it that way can be found here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With