Setup
A dictionary of the following structural form:
subnetwork_dct = {518418568: {2: (478793912, 518418568, 518758448),
3: (478793912, 518418568, 518758448, 1037590624),
4: (478793912, 518418568, 518758448, 1037590624)},
552214776: {2: (431042800, 552214776),
3: (431042800,)},
993280096: {2: (456917000, 993280096),
3: (456917000, 993280096),
4: (456917000, 993280096)}}
Expected Output
A Pandas DataFrame following the below schema:
0 1 2
518418568 2 478793912
518418568 2 518418568
518418568 2 518758448
518418568 3 478793912
518418568 3 518418568
518418568 3 518758448
518418568 3 1037590624
518418568 4 478793912
518418568 4 518418568
518418568 4 518758448
518418568 4 1037590624
552214776 2 431042800
552214776 2 552214776
552214776 3 431042800
...
Working solution:
My current approach works, but I wonder if there's a cleaner solution?
import pandas as pd
multi_index_dct = {(k1, k2):v2 for k1,v1 in subnetwork_dct.items() \
for k2,v2 in subnetwork_dct[k1].items()}
df = pd.DataFrame([multi_index_dct[i] for i in sorted(multi_index_dct)],
index=pd.MultiIndex.from_tuples([i for i in sorted(multi_index_dct.keys())]))
df_stacked = pd.DataFrame(df.stack()).reset_index()
df_stacked.drop('level_2', axis=1, inplace=True)
df_stacked.columns = [0,1,2]
df_stacked
We first take the list of nested dictionary and extract the rows of data from it. Then we create another for loop to append the rows into the new list which was originally created empty. Finally we apply the DataFrames function in the pandas library to create the Data Frame.
Pandas DataFrame: transpose() functionThe transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied. Otherwise (default), no copy is made if possible.
To make the column an index, we use the Set_index() function of pandas. If we want to make one column an index, we can simply pass the name of the column as a string in set_index(). If we want to do multi-indexing or Hierarchical Indexing, we pass the list of column names in the set_index().
Try with explode
after 0.25 pandas
pd.DataFrame(subnetwork_dct).stack().explode().reset_index()
pd.DataFrame([
(k0, k1, v) for k0, d in subnetwork_dct.items()
for k1, V in d.items()
for v in V
])
0 1 2
0 518418568 2 478793912
1 518418568 2 518418568
2 518418568 2 518758448
3 518418568 3 478793912
4 518418568 3 518418568
5 518418568 3 518758448
6 518418568 3 1037590624
7 518418568 4 478793912
8 518418568 4 518418568
9 518418568 4 518758448
10 518418568 4 1037590624
11 552214776 2 431042800
12 552214776 2 552214776
13 552214776 3 431042800
14 993280096 2 456917000
15 993280096 2 993280096
16 993280096 3 456917000
17 993280096 3 993280096
18 993280096 4 456917000
19 993280096 4 993280096
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With