I have a nested dictionary, whereby the sub-dictionary use lists:
nested_dict = {'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]},
`string2` :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]}, ... }
There are at least two elements in the list for the sub-dictionaries, but there could be more.
I would like to "unfold" this dictionary into a pandas DataFrame, with one column for the first dictionary keys (e.g. 'string1', 'string2', ..), one column for the sub-directory keys, one column for the first item in the list, one column for the next item, and so on.
Here is what the output should look like:
col1 col2 col3 col4 col5 col6
string1 69 1231 232
string1 67 682 12
string1 65 1 1
string2 28672 82 23
string2 22736 82 93 1102 102
string2 19423 64 23
Naturally, I try to use pd.DataFrame.from_dict
:
new_df = pd.DataFrame.from_dict({(i,j): nested_dict[i][j]
for i in nested_dict.keys()
for j in nested_dict[i].keys()
...
Now I'm stuck. And there are many existing problems:
How do I parse the strings (i.e. the nested_dict[i].values()
) such that each element is a new pandas DataFrame column?
The above will actually not create a column for each field
The above will not fill up the columns with elements, e.g. string1
should be in each row for the sub-directory key-value pair. (For col5
and col6
, I can fill the NA with zeros)
I'm not sure how to name these columns correctly.
This should give you the result you are looking for, although it's probably not the most elegant solution. There's probably a better (more pandas
way) to do it.
I parsed your nested dict and built a list of dictionaries (one for each row).
# some sample input
nested_dict = {
'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]},
'string2' :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]},
'string3' :{28673: [83, 24], 22737:[83, 94, 1103, 103], 19424: [65, 24]}
}
# new list is what we will use to hold each row
new_list = []
for k1 in nested_dict:
curr_dict = nested_dict[k1]
for k2 in curr_dict:
new_dict = {'col1': k1, 'col2': k2}
new_dict.update({'col%d'%(i+3): curr_dict[k2][i] for i in range(len(curr_dict[k2]))})
new_list.append(new_dict)
# create a DataFrame from new list
df = pd.DataFrame(new_list)
The output:
col1 col2 col3 col4 col5 col6
0 string2 28672 82 23 NaN NaN
1 string2 22736 82 93 1102.0 102.0
2 string2 19423 64 23 NaN NaN
3 string3 19424 65 24 NaN NaN
4 string3 28673 83 24 NaN NaN
5 string3 22737 83 94 1103.0 103.0
6 string1 65 1 1 NaN NaN
7 string1 67 682 12 NaN NaN
8 string1 69 1231 232 NaN NaN
There is an assumption that the input will always contain enough data to create a col1
and a col2
.
I loop through nested_dict
. It is assumed that each element of nested_dict
is also a dictionary. We loop through that dictionary as well (curr_dict
). The keys k1
and k2
are used to populate col1
and col2
. For the rest of the keys, we iterate through the list contents and add a column for each element.
Here's a method which uses a recursive generator to unroll the nested dictionaries. It won't assume that you have exactly two levels, but continues unrolling each dict
until it hits a list
.
nested_dict = {
'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]},
'string2' :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]},
'string3': [101, 102]}
def unroll(data):
if isinstance(data, dict):
for key, value in data.items():
# Recursively unroll the next level and prepend the key to each row.
for row in unroll(value):
yield [key] + row
if isinstance(data, list):
# This is the bottom of the structure (defines exactly one row).
yield data
df = pd.DataFrame(list(unroll(nested_dict)))
Because unroll
produces a list of lists rather than dicts, the columns will be named numerically (from 0 to 5 in this case). So you need to use rename
to get the column labels you want:
df.rename(columns=lambda i: 'col{}'.format(i+1))
This returns the following result (note that the additional string3
entry is also unrolled).
col1 col2 col3 col4 col5 col6
0 string1 69 1231 232.0 NaN NaN
1 string1 67 682 12.0 NaN NaN
2 string1 65 1 1.0 NaN NaN
3 string2 28672 82 23.0 NaN NaN
4 string2 22736 82 93.0 1102.0 102.0
5 string2 19423 64 23.0 NaN NaN
6 string3 101 102 NaN NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With