I have a dataframe where one of the columns has a dictionary in it
import pandas as pd
import numpy as np
def generate_dict():
return {'var1': np.random.rand(), 'var2': np.random.rand()}
data = {}
data[0] = {}
data[1] = {}
data[0]['A'] = generate_dict()
data[1]['A'] = generate_dict()
df = pd.DataFrame.from_dict(data, orient='index')
I would like to unpack the key/value pairs in the dictionary into a new dataframe, where each entry has it's own row. I can do that by iterating over the rows and appending to a new DataFrame:
def expand_row(row):
df_t = pd.DataFrame.from_dict({'value': row.A})
df_t.index.rename('row', inplace=True)
df_t.reset_index(inplace=True)
df_t['column'] = 'A'
return df_t
df_expanded = pd.DataFrame([])
for _, row in df.iterrows():
T = expand_row(row)
df_expanded = df_expanded.append(T, ignore_index=True)
This is rather slow, and my application is performance critical. I tihnk this is possible with df.apply
. However as my function returns a DataFrame instead of a series, simply doing
df_expanded = df.apply(expand_row)
doesn't quite work. What would be the most performant way to do this?
Thanks in advance.
You can use nested list comprehension and then replace column 0
with constant A
(column name):
d = df.A.to_dict()
df1 = pd.DataFrame([(key,key1,val1) for key,val in d.items() for key1,val1 in val.items()])
df1[0] = 'A'
df1.columns = ['columns','row','value']
print (df1)
columns row value
0 A var1 0.013872
1 A var2 0.192230
2 A var1 0.176413
3 A var2 0.253600
Another solution:
df1 = pd.DataFrame.from_records(df.A.values.tolist()).stack().reset_index()
df1['level_0'] = 'A'
df1.columns = ['columns','row','value']
print (df1)
columns row value
0 A var1 0.332594
1 A var2 0.118967
2 A var1 0.374482
3 A var2 0.263910
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With