Hi I have a pandas dataframe, I have a column A.
data = pandas.DataFrame()
data['A']= [numpy.random.choice([1,2,3,4,5,6]) for i in range(10)]
I want to add a column B condition on A (when A =1 then B=0, when A>5 then B=1) instead of using:
data.loc[data['A']==1,'B']=0
data.loc[data['A']>5, 'B']=1
Here I want to create a function to do this given the condition as a dict: {'A=1':0,'A>5':1} so I could use add_column({'A=1':0,'A>5':1}, 'B') to do the code above. I am thinking it is tricky to deal with the operators, any good idea?
def add_column(condition_dict, NewColumnName):
pass
While there may be efficient ways to do it, one possible way might be to use eval function.
Creating input df:
import pandas as pd
import numpy as np
data = pd.DataFrame()
data['A']= [np.random.choice([1,2,3,4,5,6]) for i in range(10)]
print(data)
Input df:
A
0 4
1 3
2 3
3 1
4 1
5 2
6 3
7 6
8 2
9 1
Now, a function is created such that it iterates through each row of dataframe and condition_dict and when row evaluation matches value is stored in list for corresponding row which is updated for new column. If none of the condition matches then with default it will None:
def add_column(df, condition_dict, NewColumnName):
new_values = []
for index, row in df.iterrows():
# if none of the condition matches then put default value
default_value = None
# iterate through each condition to check if any matches
for key, value in condition_dict.items():
expression = 'row.' + key
if(eval(expression)):
default_value = value
# add corresponding rows new value for new column
new_values.append(default_value)
df[NewColumnName] = new_values
Now, to call the function:
add_column(data, {'A==1':0, 'A>5':1}, 'B')
print(data)
Output:
A B
0 4 NaN
1 3 NaN
2 3 NaN
3 1 0.0
4 1 0.0
5 2 NaN
6 3 NaN
7 6 1.0
8 2 NaN
9 1 0.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With