I am creating a sample dataframe:
tp = pd.DataFrame({'source':['a','s','f'],
'target':['b','n','m'],
'count':[0,8,4]})
And creating a column 'col' based on condition of 'target' column >> same as source, if matching condition, else to a default, as below:
tp['col'] = tp.apply(lambda row:row['source'] if row['target'] in ['b','n'] else 'x')
But it's throwing me this error: KeyError: ('target', 'occurred at index count')
How can I make it work, without defining a function?
square() and Lambda Function. Apply a lambda function to multiple columns in DataFrame using Dataframe apply(), lambda, and Numpy functions.
In order to apply a function to every row, you should use axis=1 param to apply(), default it uses axis=0 meaning it applies a function to each column. By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.
We can do this with the apply() function in Pandas. We can use the apply() function to apply the lambda function to both rows and columns of a dataframe. If the axis argument in the apply() function is 0, then the lambda function gets applied to each column, and if 1, then the function gets applied to each row.
You need to use axis=1
to tell Pandas you want to apply a function to each row. The default is axis=0
.
tp['col'] = tp.apply(lambda row: row['source'] if row['target'] in ['b', 'n'] else 'x',
axis=1)
However, for this specific task, you should use vectorised operations. For example, using numpy.where
:
tp['col'] = np.where(tp['target'].isin(['b', 'n']), tp['source'], 'x')
pd.Series.isin
returns a Boolean series which tells numpy.where
whether to select the second or third argument.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With