Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function to dataframe column element based on value in other column for same row?

I have a dataframe:

df = pd.DataFrame(
    {'number': ['10', '20' , '30', '40'], 'condition': ['A', 'B', 'A', 'B']})

df = 
    number    condition
0    10         A
1    20         B
2    30         A
3    40         B

I want to apply a function to each element within the number column, as follows:

 df['number'] = df['number'].apply(lambda x: func(x))

BUT, even though I apply the function to the number column, I want the function to also make reference to the condition column i.e. in pseudo code:

func(n):
    #if the value in corresponding condition column is equal to some set of values:
        # do some stuff to n using the value in condition
        # return new value for n

For a single number, and an example function I would write:

number = 10
condition = A
def func(num, condition):
    if condition == A:
        return num*3
    if condition == B:
        return num*4

func(number,condition) = 15

How can I incorporate the same function to my apply statement written above? i.e. making reference to the value within the condition column, while acting on the value within the number column?

Note: I have read through the docs on np.where(), pandas.loc() and pandas.index() but I just cannot figure out how to put it into practice.

I am struggling with the syntax for referencing the other column from within the function, as I need access to both the values in the number and condition column.

As such, my expected output is:

df = 
    number    condition
0    30         A
1    80         B
2    90         A
3    160         B

UPDATE: The above was far too vague. Please see the following:

df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})


    Entries    Conflict
0    "man"    "Yes"
1    "guy"    "Yes"
2    "boy"    "Yes"
3    "girl"   "No

def funcA(d):
    d = d + 'aaa'
    return d
def funcB(d):
    d = d + 'bbb'
    return d

df1['Entries'] = np.where(df1['Conflict'] == 'Yes', funcA, funcB)

Output:
{'Conflict': ['Yes', 'Yes', 'Yes', 'Np'],
 'Entries': array(<function funcB at 0x7f4acbc5a500>, dtype=object)}

How can I apply the above np.where statement to take a pandas series as mentioned in the comments, and produce the desired output shown below:

Desired Output:

    Entries    Conflict
0    "manaaa"    "Yes"
1    "guyaaa"    "Yes"
2    "boyaaa"    "Yes"
3    "girlbbb"   "No
like image 997
Chuck Avatar asked Jan 31 '17 16:01

Chuck


People also ask

How to apply a function to each row/column in Dataframe?

There are different ways to apply a function to each row or column in DataFrame. We will learn about various ways in this post. Let’s create a small dataframe first and see that. Method 1: Applying lambda function to each row/column. In the above examples, we saw how a user defined function is applied to each row and column.

Can we apply a function to more than one column?

Example 2: For Row. We can also apply a function to more than one column or row in the dataframe. Example 2: For Row. How to Apply a function to multiple columns in Pandas?

How do I apply a function to a column in pandas?

In Pandas, columns and dataframes can be transformed and manipulated using methods such as apply () and transform (). The desired transformations are passed in as arguments to the methods as functions. Each method has its subtle differences and utility. This article will introduce how to apply a function to a column or an entire dataframe.

How do I transform a column in a Dataframe in pandas?

Use transform () to Apply a Function to Pandas DataFrame Column In Pandas, columns and dataframes can be transformed and manipulated using methods such as apply () and transform (). The desired transformations are passed in as arguments to the methods as functions. Each method has its subtle differences and utility.


2 Answers

As the question was in regard to the apply function to a dataframe column for the same row, it seems more accurate to use the pandas apply funtion in combination with lambda:

import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})

def func(number,condition):
    multiplier = {'A': 2, 'B': 4}
    return number * multiplier[condition]

df['new_number'] = df.apply(lambda x: func(x['number'], x['condition']), axis=1)

In this example, lambda takes the columns 'number' and 'condition' of the dataframe df and applies these columns of the same row to the function func with apply.

This returns the following result:

df
Out[10]: 
 condition  number  new_number
0   A   10  20
1   B   20  80
2   A   30  60
3   B   40  160

For the UPDATE case its also possible to use the pandas apply function:

df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})

def funcA(d):
    d = d + 'aaa'
    return d
def funcB(d):
    d = d + 'bbb'
    return d

df1['Entries'] = df1.apply(lambda x: funcA(x['Entries']) if x['Conflict'] == 'Yes' else funcB(x['Entries']), axis=1)

In this example, lambda takes the columns 'Entries' and 'Conflict' of the dataframe df and applies these columns either to funcA or funcB of the same row with apply. The condition if funcA or funcB will be applied is done with an if-else clause in lambda.

This returns the following result:

df
Out[12]:


    Conflict    Entries
0   Yes     manaaa
1   Yes     guyaaa
2   Yes     boyaaa
3   No  girlbbb
like image 198
Rene B. Avatar answered Sep 18 '22 05:09

Rene B.


I don't know about using pandas.DataFrame.apply, but you could define a certain condition:multiplier key-value mapping (seen in multiplier below), and pass that into your function. Then you can use a list comprehension to calculate the new number output based on those conditions:

import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})

multiplier = {'A': 2, 'B': 4}

def func(num, condition, multiplier):
    return num * multiplier[condition]

df['new_number'] = [func(df.loc[idx, 'number'], df.loc[idx, 'condition'], 
                     multiplier) for idx in range(len(df))]

Here's the result:

df
Out[24]: 
  condition  number  new_number
0         A      10          30
1         B      20          80
2         A      30          90
3         B      40         160

There is likely a vectorized, pure-pandas solution that's more "ideal." But this works, too, in a pinch.

like image 41
blacksite Avatar answered Sep 18 '22 05:09

blacksite