I have a csv with a table of data. I would like to read the csv and write a new xlsx file based on the initial csv.
However, I would also like to add a new column that will have a value based on the name of the header (e.g if the header contains the word online) and create sort of a pivot table based on this logic. So as opposed to having three columns for leads, I would have one column represented by three rows (for each column)
date online_won retail_won outbound_won online_leads retail_leads outbound_leads
1/1/11 9 10 11 12 14
2/1/11 1 2 13 15
3/1/11 10 8 14 17
This is the desired output
date source won leads
1/1/11 online 9 12
1/1/11 retail 10 14
1/1/11 outbound 11
.....
I would assume that I can solve this using pd.pivot_table. But can't figure out how to return columns as won and leads and extract only the online/retail/outbound part from the existing columns.
You could use pd.wide_to_long
, with a little extra work on the columns, since wide format variables are assumed to start with the stub names:
df.columns = ['_'.join(j for j in i[::-1]) for i in df.columns.str.split('_')]
(pd.wide_to_long(df, stubnames=['won','leads'], i='date', j='source', suffix='_\w+')
.reset_index())
date source won leads
0 1/1/11 _online 9 12.0
1 2/1/11 _online 1 15.0
2 3/1/11 _online 10 17.0
3 1/1/11 _retail 10 14.0
4 2/1/11 _retail 2 NaN
5 3/1/11 _retail 8 NaN
6 1/1/11 _outbound 11 NaN
7 2/1/11 _outbound 13 NaN
8 3/1/11 _outbound 14 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With