Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function with args in pandas

I am trying to find whether Date falls in PromoInterval in a data frame.

print dset1

        Store   Date    PromoInterval
1760    2   2013-05-04  Jan,Apr,Jul,Oct
1761    2   2013-05-03  Jan,Apr,Jul,Oct
1762    2   2013-05-02  Jan,Apr,Jul,Oct
1763    2   2013-05-01  Jan,Apr,Jul,Oct
1764    2   2013-04-30  Jan,Apr,Jul,Oct

def func(a,b):
    y = b.split(",")
    z = {1:'Jan',2:'Feb',3:'Mar', 4:'Apr',5:'May',6:'Jun',7:'Jul',8:'Aug',9:'Sep',
        10:'Oct',11:'Nov',12:'Dec'}
    return (z[a] in y)

dset1.apply(func, axis=1, args = (dset1['Date'].dt.month, dset1['PromoInterval']) )

Struck at below error:

dset1.apply(func, axis=1, args = (dset1['Date'].dt.month, >dset1['PromoInterval']) ) ('func() takes exactly 2 arguments (3 given)', u'occurred at index 1760')

Data set:

{'Date': {1760: Timestamp('2013-05-04 00:00:00'),
  1761: Timestamp('2013-05-03 00:00:00'),
  1762: Timestamp('2013-05-02 00:00:00'),
  1763: Timestamp('2013-05-01 00:00:00'),
  1764: Timestamp('2013-04-30 00:00:00')},
 'PromoInterval': {1760: 'Jan,Apr,Jul,Oct',
  1761: 'Jan,Apr,Jul,Oct',
  1762: 'Jan,Apr,Jul,Oct',
  1763: 'Jan,Apr,Jul,Oct',
  1764: 'Jan,Apr,Jul,Oct'},
 'Store': {1760: 2, 1761: 2, 1762: 2, 1763: 2, 1764: 2}}
like image 908
WoodChopper Avatar asked Oct 28 '15 13:10

WoodChopper


People also ask

How do I apply a function in pandas?

apply takes a function and applies it to all values of pandas series. convert_dtype: Convert dtype as per the function's operation. args=(): Additional arguments to pass to function instead of series. Return Type: Pandas Series after applied function/operation.

How do I apply a function to a column in pandas?

To apply it to a single column, qualify the column name using df["col_name"] . The below example applies a function to a column B . Yields below output. This applies the function to every row in DataFrame for a specified column.

How will you apply a function to a row of pandas DataFrame?

Use apply() function when you wanted to update every row in pandas DataFrame by calling a custom function. In order to apply a function to every row, you should use axis=1 param to apply(). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.


3 Answers

I would start by formatting the text string of the month using a lambda function on the 'Date' column:

df['Month'] = df['Date'].apply(lambda x: x.strftime('%b'))

Then I would fire a lambda function on axis=1 which means it operates on the x axis over the dataframe. Here I simply check if 'Month' is in 'PromoInterval'

df[['PromoInterval', 'Month']].apply(lambda x: x[1] in x[0], axis=1)

1760    False
1761    False
1762    False
1763    False
1764     True
dtype: bool
like image 170
firelynx Avatar answered Oct 15 '22 09:10

firelynx


A solution is to make your function take a row instead of elements:

def func(row):
    y = row[2].split(",")
    z = {1:'Jan', 2:'Feb', 3:'Mar', 4:'Apr', 5:'May', 6:'Jun',
        7:'Jul', 8:'Aug', 9:'Sep', 10:'Oct', 11:'Nov', 12:'Dec'}
    return (z[row[1].month] in y)

You can then apply it straightforwardly:

df['Result'] = df.apply(func, axis=1)

Note: the function uses .month because I converted dates to datetime objects with pd.to_datetime.

like image 36
IanS Avatar answered Oct 15 '22 08:10

IanS


actually this is because the function takes 3 parameters , not two

def func(df,a,b):
    print('---df----')
    print(df)
    print('---a---')
    print(a)
    print('---b---')
    print(b)
    y = b.split(",")
    z = {1:'Jan',2:'Feb',3:'Mar', 4:'Apr',5:'May',6:'Jun',7:'Jul',8:'Aug',9:'Sep',
        10:'Oct',11:'Nov',12:'Dec'}
    return (z[a] in y)

In [98]:
dset1.apply(func, axis=1, args = (dset1['Date'].dt.month, dset1['PromoInterval']) )

In [99]:

---df----
Store                              2
Date             2013-05-04 00:00:00
PromoInterval        Jan,Apr,Jul,Oct
Name: 0, dtype: object
---a---
0    5
1    5
2    5
3    5
4    4
dtype: int64
---b---
0    Jan,Apr,Jul,Oct
1    Jan,Apr,Jul,Oct
2    Jan,Apr,Jul,Oct
3    Jan,Apr,Jul,Oct
4    Jan,Apr,Jul,Oct
Name: PromoInterval, dtype: object

Instead you can do the following

In [94]:

def func(df):
    y = df['PromoInterval'].split(",")
    z = {1:'Jan',2:'Feb',3:'Mar', 4:'Apr',5:'May',6:'Jun',7:'Jul',8:'Aug',9:'Sep',
    10:'Oct',11:'Nov',12:'Dec'}
    return (z[df.Date.month] in y)

In [95]:
dset1.apply(func, axis=1)



Out[112]:
0    False
1    False
2    False
3    False
4     True
dtype: bool
like image 37
Nader Hisham Avatar answered Oct 15 '22 09:10

Nader Hisham