Given a DataFrame df1 as follows:
df1 = pd.DataFrame({
'col1': [1,2,3,4],
'col2': [['a', 'b'], ['c'], ['a', 'd', 'b'], ['e']]
})
Which looks like:
col1 col2
0 1 [a, b]
1 2 [c]
2 3 [a, d, b]
3 4 [e]
I want to convert col2 - a column where each cell is a list - into several columns (a, b, c, d, e), where the values are boolean entries defining whether that column name existed in the original list, in the given row.
The output should follow this form:
df2 = pd.DataFrame({
'col1': [1,2,3,4],
'a': [True, False, True, False],
'b': [True, False, True, False],
'c': [False, True, False, False],
'd': [False, False, True, False],
'e': [False, False, False, True]
})
Which looks like:
col1 a b c d e
0 1 True True False False False
1 2 False False True False False
2 3 True True False True False
3 4 False False False False True
What's a clean way to do this?
Alternative approach using str.get_dummies probably more efficient than apply + pd.Series:
df1['col2'].str.join(',').str.get_dummies(sep=',').astype(bool)
a b c d e
0 True True False False False
1 False False True False False
2 True True False True False
3 False False False False True
Timings:
df1.shape
(40000, 2)
%%timeit
df1['col2'].str.join(',').str.get_dummies(sep=',').astype(bool)
286 ms ± 16.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
pd.get_dummies(df1['col2'].apply(pd.Series).stack()).sum(level=0)
9.43 s ± 499 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
One approach
df2 = df1.reset_index().explode('col2')
new_df = pd.crosstab(df2['col1'], df2['col2']).astype(bool)
print(new_df)
col2 a b c d e
col1
1 True True False False False
2 False False True False False
3 True True False True False
4 False False False False True
Or
new_df = (df1.reset_index()
.explode('col2')
.groupby(['index', 'col2'])['col1']
.any()
.unstack(fill_value=False))
new_df = (df1.reset_index()
.explode('col2')
.pivot_table(index='index',
values='col1',
columns='col2',
aggfunc='any',
fill_value=False)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With