If I have a dataframe and it has col1 which is file name, col2 is values that I want transpose into file name with its values for example:
Input:
col1 col2
file1 text_0
file1 text_1
file1 text_2
file2 text_0
file2 text_1
file2 text_2
file2 text_3
file3 text_0
Output:
col1 col2 col3 col4 col5
file1 text_0 text_1 text_2
file2 text_0 text_1 text_2 text_3
file3 text_0
First idea is use GroupBy.cumcount for counter of duplicated values of col1 for new columns names and reshape by Series.unstack:
df = (df.set_index(['col1',df.groupby('col1').cumcount()])['col2']
.unstack(fill_value='')
.reset_index())
df.columns = [f'col{x}' for x in range(1, len(df.columns) + 1)]
print (df)
col1 col2 col3 col4 col5
0 file1 text_0 text_1 text_2
1 file2 text_0 text_1 text_2 text_3
2 file3 text_0
Or create Series of list and avoid use apply(pd.Series), because slow, better is use DataFrame constructor:
s = df.groupby('col1')['col2'].apply(list)
df = pd.DataFrame(s.tolist(), index=s.index).reset_index().fillna('')
df.columns = [f'col{x}' for x in range(1, len(df.columns) + 1)]
print (df)
col1 col2 col3 col4 col5
0 file1 text_0 text_1 text_2
1 file2 text_0 text_1 text_2 text_3
2 file3 text_0
Alternative:
s = df.groupby('col1')['col2'].apply(list)
L = [[k] + v for k, v in s.items()]
df = pd.DataFrame(L).fillna('').rename(columns=lambda x: f'col{x+1}')
print (df)
col1 col2 col3 col4 col5
0 file1 text_0 text_1 text_2
1 file2 text_0 text_1 text_2 text_3
2 file3 text_0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With