Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas Proc Transpose Equivalent

I have a sas proc transpose i'm trying to replicate in pandas.

Here is an example:

ID = ['ID1', 'ID1', 'ID1', 'ID1', 'ID1']
obs_week = [201701,201701,201701,201701,201701]
weeks_id = [1,2,3,4,5]
spend = [100,200,300,400,500]
df = pd.DataFrame(zip(ID, obs_week, weeks_id, spend ), columns = ['id', 'obs_week', 'weeks_id', 'spend'])
df

This gives a table like this:

    id  obs_week    weeks_id    spend
0   ID1 201701      1           100
1   ID1 201701      2           200
2   ID1 201701      3           300
3   ID1 201701      4           400
4   ID1 201701      5           500

I am trying to transpose this so that ID1 and obs_week become unique and then the weeks_id becomes the new columns with a prefix.

The sas code looks like this:

proc transpose data=spend out=spend_hh (drop = _label_ _name_) prefix=spend_;
  by id obs_week;
  id weeks_id;
  var spend;
run;

I have managed to get close using df.pivot_table

df.pivot_table(index=['id','obs_week'], columns='weeks_id', aggfunc=sum, fill_value=0)

Giving a table like this

                   spend
weeks_id           1    2   3   4   5
id       obs_week                   
ID1      201701    100  200 300 400 500

My issue is that i would like to rename 1 2 3 4 5 to be spend_1, spend_2, etc

I also want to do this for multiple different variables within the file but i assume i can just limit the selection to just the fields i want

My answer should look like this:

    id  obs_week    spend_1 spend_2 spend_3 spend_4 spend_5
0   ID1 201701      100     200     300     400     500

Is this just collapsing the headers somehow?

I also want id and obs_week to not be part of the index.

like image 718
TaxiMike Avatar asked Apr 16 '26 17:04

TaxiMike


2 Answers

You need list comprehension for create columns names first and then reset_index for column with index and rename_axis for remove weeks_id text:

df = df.pivot_table(index=['id','obs_week'], columns='weeks_id', aggfunc=sum, fill_value=0)

df.columns = ['{}_{}'.format(x[0], x[1]) for x in df.columns]
df = df.reset_index().rename_axis(None, axis=1)
print (df)
    id  obs_week  spend_1  spend_2  spend_3  spend_4  spend_5
0  ID1    201701      100      200      300      400      500

Or:

df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index().rename_axis(None, axis=1)
print (df)
    id  obs_week  spend_1  spend_2  spend_3  spend_4  spend_5
0  ID1    201701      100      200      300      400      500
like image 59
jezrael Avatar answered Apr 18 '26 08:04

jezrael


Here's a single-liner

In [1446]: (df.pivot_table(index=['id', 'obs_week'], columns=['weeks_id'], values='spend')
              .add_prefix('spend_')
              .reset_index())
Out[1446]:
weeks_id   id  obs_week  spend_1  spend_2  spend_3  spend_4  spend_5
0         ID1    201701      100      200      300      400      500

Or,

In [1449]: (df.pivot_table(index=['id', 'obs_week'], columns=['weeks_id'], values='spend')
              .add_prefix('spend_')
              .reset_index()
              .rename_axis(None, axis=1))
Out[1449]:
    id  obs_week  spend_1  spend_2  spend_3  spend_4  spend_5
0  ID1    201701      100      200      300      400      500
like image 30
Zero Avatar answered Apr 18 '26 06:04

Zero



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!