I am looking for any advice on how to cleanly convert a python multi level nested dictionary (from JSON) into a data frame boolean table.
Rules:
Example Input:
{1:{'group_a':{'bool_a':True,
               'bool_b':True,
               'bool_n':True},
    'group_n':{'bool_b':True,
               'bool_n':True}
   },
 2:{'group_a':{'bool_a':True,
               'bool_b':True,
               'bool_n':True},
    'group_n':{'bool_b':True,
               'bool_n':True}
   },
'n':{'group_a':{'bool_a':True,
                'bool_c':True},
     'group_n':{'bool_b':True}
   },
}
Desired Output:
   Ga_Ba, Ga_Bb, Ga_Bc, Ga_Bn, Gn_Ba, Gn_Bb, ... Gn_Bn....
1   True   True  False  True   False   True       True
2   True   True  False  True   False   True       True
n   True   False True   False  False   False      False
...
Ideas? Bonus points for speed and conciseness. I have a solution but I am looking for something more elegant than the for loop mess I have now. Alternative data structures may also be welcome.
s = pd.DataFrame.from_dict(data, orient='index').stack()
pd.json_normalize(s).set_index(s.index) \
  .stack().unstack([1, 2], fill_value=False) \
  .sort_index(axis=1)
  group_a                      group_n       
   bool_a bool_b bool_c bool_n  bool_b bool_n
1    True   True  False   True    True   True
2    True   True  False   True    True   True
3    True  False   True  False    True  False
pd.DataFrame.from_dict({
    k0: {
            f'G{k1.split("_")[1]}_B{k2.split("_")[1]}': val
            for k1, d1 in d0.items()
            for k2, val in d1.items()
        }
    for k0, d0 in data.items()
}, orient='index').fillna(False)
   Ga_Ba  Ga_Bb  Ga_Bn  Gn_Bb  Gn_Bn  Ga_Bc
1   True   True   True   True   True  False
2   True   True   True   True   True  False
3   True  False  False   True  False   True
You could use a dictionary comprehension and concat:
import pandas as pd
values = {
    "1": {
        "group_a": {"bool_a": True, "bool_b": True, "bool_n": True},
        "group_n": {"bool_b": True, "bool_n": True},
    },
    "2": {
        "group_a": {"bool_a": True, "bool_b": True, "bool_n": True},
        "group_n": {"bool_b": True, "bool_n": True},
    },
    "n": {"group_a": {"bool_a": True, "bool_c": True}, "group_n": {"bool_b": True}},
}
stacked_values = {k: pd.DataFrame(v).stack() for k, v in values.items()}
df = (
    pd.concat(stacked_values, axis=1)
    .T.fillna(False)
    .swaplevel(axis=1)  # optional
    .sort_index(axis=1)
)
Output:
  group_a                      group_n       
   bool_a bool_b bool_c bool_n  bool_b bool_n
1    True   True  False   True    True   True
2    True   True  False   True    True   True
n    True  False   True  False    True  False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With