Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Melt with Multiple Value Vars

I have a data set which is in wide format like this

   Index Country     Variable 2000 2001 2002 2003 2004 2005
   0     Argentina   var1     12   15   18    17  23   29
   1     Argentina   var2     1    3    2     5   7    5
   2     Brazil      var1     20   23   25   29   31   32
   3     Brazil      var2     0    1    2    2    3    3

I want to reshape my data to long so that year, var1, and var2 become new columns

  Index Country     year   var1 var2
  0     Argentina   2000   12   1
  1     Argentina   2001   15   3
  2     Argentina   2002   18   2
  ....
  6     Brazil      2000   20   0
  7     Brazil      2001   23   1

I got my code to work when I only had one variable by writing

df=(pd.melt(df,id_vars='Country',value_name='Var1', var_name='year'))

I cant figure out how to do this for a var1,var2, var3, etc.

like image 319
LauraF Avatar asked Jul 12 '17 20:07

LauraF


1 Answers

Option 1

Using melt then unstack for var1, var2, etc...

(df1.melt(id_vars=['Country','Variable'],var_name='Year')
    .set_index(['Country','Year','Variable'])
    .squeeze()
    .unstack()
    .reset_index())

Output:

Variable    Country  Year  var1  var2
0         Argentina  2000    12     1
1         Argentina  2001    15     3
2         Argentina  2002    18     2
3         Argentina  2003    17     5
4         Argentina  2004    23     7
5         Argentina  2005    29     5
6            Brazil  2000    20     0
7            Brazil  2001    23     1
8            Brazil  2002    25     2
9            Brazil  2003    29     2
10           Brazil  2004    31     3
11           Brazil  2005    32     3

Option 2

Using pivot then stack:

(df1.pivot(index='Country',columns='Variable')
   .stack(0)
   .rename_axis(['Country','Year'])
   .reset_index())

Output:

Variable    Country  Year  var1  var2
0         Argentina  2000    12     1
1         Argentina  2001    15     3
2         Argentina  2002    18     2
3         Argentina  2003    17     5
4         Argentina  2004    23     7
5         Argentina  2005    29     5
6            Brazil  2000    20     0
7            Brazil  2001    23     1
8            Brazil  2002    25     2
9            Brazil  2003    29     2
10           Brazil  2004    31     3
11           Brazil  2005    32     3

Option 3 (ayhan's solution)

Using set_index, stack, and unstack:

(df.set_index(['Country', 'Variable'])
   .rename_axis(['Year'], axis=1)
   .stack()
   .unstack('Variable')
   .reset_index())

Output:

Variable    Country  Year  var1  var2
0         Argentina  2000    12     1
1         Argentina  2001    15     3
2         Argentina  2002    18     2
3         Argentina  2003    17     5
4         Argentina  2004    23     7
5         Argentina  2005    29     5
6            Brazil  2000    20     0
7            Brazil  2001    23     1
8            Brazil  2002    25     2
9            Brazil  2003    29     2
10           Brazil  2004    31     3
11           Brazil  2005    32     3
like image 191
Scott Boston Avatar answered Sep 20 '22 05:09

Scott Boston