I have a dataframe:
df = pd.DataFrame(
{'number': ['10', '20' , '30', '40'], 'condition': ['A', 'B', 'A', 'B']})
df =
number condition
0 10 A
1 20 B
2 30 A
3 40 B
I want to apply a function to each element within the number column, as follows:
df['number'] = df['number'].apply(lambda x: func(x))
BUT, even though I apply the function to the number column, I want the function to also make reference to the condition
column i.e. in pseudo code:
func(n):
#if the value in corresponding condition column is equal to some set of values:
# do some stuff to n using the value in condition
# return new value for n
For a single number, and an example function I would write:
number = 10
condition = A
def func(num, condition):
if condition == A:
return num*3
if condition == B:
return num*4
func(number,condition) = 15
How can I incorporate the same function to my apply
statement written above? i.e. making reference to the value within the condition column, while acting on the value within the number column?
Note: I have read through the docs on np.where()
, pandas.loc()
and pandas.index()
but I just cannot figure out how to put it into practice.
I am struggling with the syntax for referencing the other column from within the function, as I need access to both the values in the number
and condition
column.
As such, my expected output is:
df =
number condition
0 30 A
1 80 B
2 90 A
3 160 B
UPDATE: The above was far too vague. Please see the following:
df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})
Entries Conflict
0 "man" "Yes"
1 "guy" "Yes"
2 "boy" "Yes"
3 "girl" "No
def funcA(d):
d = d + 'aaa'
return d
def funcB(d):
d = d + 'bbb'
return d
df1['Entries'] = np.where(df1['Conflict'] == 'Yes', funcA, funcB)
Output:
{'Conflict': ['Yes', 'Yes', 'Yes', 'Np'],
'Entries': array(<function funcB at 0x7f4acbc5a500>, dtype=object)}
How can I apply the above np.where statement to take a pandas series as mentioned in the comments, and produce the desired output shown below:
Desired Output:
Entries Conflict
0 "manaaa" "Yes"
1 "guyaaa" "Yes"
2 "boyaaa" "Yes"
3 "girlbbb" "No
There are different ways to apply a function to each row or column in DataFrame. We will learn about various ways in this post. Let’s create a small dataframe first and see that. Method 1: Applying lambda function to each row/column. In the above examples, we saw how a user defined function is applied to each row and column.
Example 2: For Row. We can also apply a function to more than one column or row in the dataframe. Example 2: For Row. How to Apply a function to multiple columns in Pandas?
In Pandas, columns and dataframes can be transformed and manipulated using methods such as apply () and transform (). The desired transformations are passed in as arguments to the methods as functions. Each method has its subtle differences and utility. This article will introduce how to apply a function to a column or an entire dataframe.
Use transform () to Apply a Function to Pandas DataFrame Column In Pandas, columns and dataframes can be transformed and manipulated using methods such as apply () and transform (). The desired transformations are passed in as arguments to the methods as functions. Each method has its subtle differences and utility.
As the question was in regard to the apply function to a dataframe column for the same row, it seems more accurate to use the pandas apply
funtion in combination with lambda
:
import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})
def func(number,condition):
multiplier = {'A': 2, 'B': 4}
return number * multiplier[condition]
df['new_number'] = df.apply(lambda x: func(x['number'], x['condition']), axis=1)
In this example, lambda
takes the columns 'number' and 'condition' of the dataframe df and applies these columns of the same row to the function func with apply
.
This returns the following result:
df
Out[10]:
condition number new_number
0 A 10 20
1 B 20 80
2 A 30 60
3 B 40 160
For the UPDATE case its also possible to use the pandas apply
function:
df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})
def funcA(d):
d = d + 'aaa'
return d
def funcB(d):
d = d + 'bbb'
return d
df1['Entries'] = df1.apply(lambda x: funcA(x['Entries']) if x['Conflict'] == 'Yes' else funcB(x['Entries']), axis=1)
In this example, lambda
takes the columns 'Entries' and 'Conflict' of the dataframe df and applies these columns either to funcA or funcB of the same row with apply
. The condition if funcA or funcB will be applied is done with an if-else
clause in lambda.
This returns the following result:
df
Out[12]:
Conflict Entries
0 Yes manaaa
1 Yes guyaaa
2 Yes boyaaa
3 No girlbbb
I don't know about using pandas.DataFrame.apply
, but you could define a certain condition:multiplier
key-value mapping (seen in multiplier
below), and pass that into your function. Then you can use a list comprehension to calculate the new number
output based on those conditions:
import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})
multiplier = {'A': 2, 'B': 4}
def func(num, condition, multiplier):
return num * multiplier[condition]
df['new_number'] = [func(df.loc[idx, 'number'], df.loc[idx, 'condition'],
multiplier) for idx in range(len(df))]
Here's the result:
df
Out[24]:
condition number new_number
0 A 10 30
1 B 20 80
2 A 30 90
3 B 40 160
There is likely a vectorized, pure-pandas solution that's more "ideal." But this works, too, in a pinch.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With