Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pivot Tables on python

I have a csv with a table of data. I would like to read the csv and write a new xlsx file based on the initial csv.

However, I would also like to add a new column that will have a value based on the name of the header (e.g if the header contains the word online) and create sort of a pivot table based on this logic. So as opposed to having three columns for leads, I would have one column represented by three rows (for each column)

date    online_won    retail_won     outbound_won    online_leads   retail_leads     outbound_leads
1/1/11     9            10              11             12             14
2/1/11     1            2               13             15
3/1/11     10           8               14             17

This is the desired output

date      source   won    leads
1/1/11    online    9       12
1/1/11    retail   10       14
1/1/11    outbound 11       
.....

I would assume that I can solve this using pd.pivot_table. But can't figure out how to return columns as won and leads and extract only the online/retail/outbound part from the existing columns.

like image 493
Emm Avatar asked Mar 03 '23 09:03

Emm


1 Answers

You could use pd.wide_to_long, with a little extra work on the columns, since wide format variables are assumed to start with the stub names:

df.columns = ['_'.join(j for j in i[::-1]) for i in df.columns.str.split('_')]
(pd.wide_to_long(df, stubnames=['won','leads'], i='date', j='source', suffix='_\w+')
   .reset_index())

    date     source   won  leads
0  1/1/11    _online    9   12.0
1  2/1/11    _online    1   15.0
2  3/1/11    _online   10   17.0
3  1/1/11    _retail   10   14.0
4  2/1/11    _retail    2    NaN
5  3/1/11    _retail    8    NaN
6  1/1/11  _outbound   11    NaN
7  2/1/11  _outbound   13    NaN
8  3/1/11  _outbound   14    NaN
like image 164
yatu Avatar answered Mar 12 '23 08:03

yatu