I have a dataframe like this.
index column1
e1 {u'c680': 5, u'c681': 1, u'c682': 2, u'c57...
e2 {u'c680': 6, u'c681': 2, u'c682': 1, u'c57...
e3 {u'c680': 2, u'c681': 4, u'c682': 2, u'c57...
e4 {u'c680': 4, u'c681': 2, u'c682': 3, u'c57...
e5 {u'c680': 3, u'c681': 5, u'c683': 3, u'c57...
Now I want to expand the dict in column1 to individual columns like below.
index c680 c681 c682 c683
e1 5 1 2 0
e2 6 2 1 0
e3 2 4 2 0
e4 4 2 3 0
e5 3 5 0 3
Is there a pandas shortcut that can achieve this?
The best here is not use apply(pd.Series) because very slow, but DataFrame contructor with convert NaNs to 0 and then to ints:
df = pd.DataFrame({'column1': [{'c681': 1, 'c682': 2, 'c57': 4, 'c680': 5},
{'c681': 2, 'c682': 1, 'c57': 7, 'c680': 6},
{'c681': 4, 'c682': 2, 'c57': 8, 'c680': 2},
{'c681': 2, 'c682': 3, 'c57': 1, 'c680': 4},
{'c683': 3, 'c681': 5, 'c57': 0, 'c680': 3}]},
index=['e1','e2','e3','e4','e5'])
print (df)
column1
e1 {'c680': 5, 'c682': 2, 'c57': 4, 'c681': 1}
e2 {'c680': 6, 'c682': 1, 'c57': 7, 'c681': 2}
e3 {'c680': 2, 'c682': 2, 'c57': 8, 'c681': 4}
e4 {'c680': 4, 'c682': 3, 'c57': 1, 'c681': 2}
e5 {'c683': 3, 'c680': 3, 'c57': 0, 'c681': 5}
df = pd.DataFrame(df['column1'].values.tolist(), index=df.index).fillna(0).astype(int)
print (df)
c57 c680 c681 c682 c683
e1 4 5 1 2 0
e2 7 6 2 1 0
e3 8 2 4 2 0
e4 1 4 2 3 0
e5 0 3 5 0 3
df = pd.concat([df] * 1000, ignore_index=True)
In [108]: %timeit (pd.DataFrame(df['column1'].values.tolist(), index=df.index))
100 loops, best of 3: 10.1 ms per loop
In [109]: %timeit (df['column1'].apply(pd.Series))
1 loop, best of 3: 1.14 s per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With