Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: df.groupby(x, y).apply() across multiple columns parameter error

Tags:

python

pandas

Basic Problem:

I have several 'past' and 'present' variables that I'd like to perform a simple percent change 'row-wise' on. For example: ((exports_now - exports_past)/exports_past)).

These two questions accomplish this but when I try a similar method I get an error that my function deltas gets an unknown parameter axis.

  • How to apply a function to two columns of Pandas dataframe
  • Pandas: How to use apply function to multiple columns

Data Example :

exports_ past    exports_ now    imports_ past    imports_ now    ect.(6 other pairs)
   .23               .45             .43             .22              1.23
   .13               .21             .47             .32               .23
    0                 0              .41             .42               .93
   .23               .66             .43             .22               .21
    0                .12             .47             .21              1.23

Following the answer in the first question,

My solution is to use a function like this:

def deltas(row):
    '''
    simple pct change
    '''
    if int(row[0]) == 0 and int(row[1]) == 0:
        return 0
    elif int(row[0]) == 0:
        return np.nan
    else:
        return ((row[1] - row[0])/row[0])

And apply the function like this:

df['exports_delta'] = df.groupby(['exports_past', 'exports_now']).apply(deltas, axis=1)

This generates this error : TypeError: deltas() got an unexpected keyword argument 'axis' Any Ideas on how to get around the axis parameter error? Or a more elegant way to calculate the pct change? The kicker with my problem is that I needs be able to apply this function across several different column pairs, so hard coding the column names like the answer in 2nd question is undesirable. Thanks!

like image 924
agconti Avatar asked Jul 31 '13 14:07

agconti


People also ask

Can you use groupby with multiple columns in pandas?

Grouping by Multiple ColumnsYou can do this by passing a list of column names to groupby instead of a single string value.

What is possible using groupby () method of pandas?

What is the GroupBy function? Pandas' GroupBy is a powerful and versatile function in Python. It allows you to split your data into separate groups to perform computations for better analysis.

What does the function DataFrame groupby () return?

Pandas dataframe.groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.


1 Answers

Consider using the pct_change Series/DataFrame method to do this.

df.pct_change()

The confusion stems from two different (but equally named) apply functions, one on Series/DataFrame and one on groupby.

In [11]: df
Out[11]:
   0  1  2
0  1  1  1
1  2  2  2

The DataFrame apply method takes an axis argument:

In [12]: df.apply(lambda x: x[0] + x[1], axis=0)
Out[12]:
0    3
1    3
2    3
dtype: int64

In [13]: df.apply(lambda x: x[0] + x[1], axis=1)
Out[13]:
0    2
1    4
dtype: int64

The groupby apply doesn't, and the kwarg is passed to the function:

In [14]: g.apply(lambda x: x[0] + x[1])
Out[14]:
0    2
1    4
dtype: int64

In [15]: g.apply(lambda x: x[0] + x[1], axis=1)
TypeError: <lambda>() got an unexpected keyword argument 'axis'

Note: that groupby does have an axis argument, so you can use it there, if you really want to:

In [16]: g1 = df.groupby(0, axis=1)

In [17]: g1.apply(lambda x: x.iloc[0, 0] + x.iloc[1, 0])
Out[17]:
0
1    3
2    3
dtype: int64
like image 83
Andy Hayden Avatar answered Sep 21 '22 03:09

Andy Hayden