Long time user of this site but first time asking a question! Thanks to all of the benevolent users who have been answering questions for ages :)
I have been using df.apply
lately and ideally want to pass a dataframe into the args
parameter to look something like so: df.apply(testFunc, args=(dfOther), axis = 1)
My ultimate goal is to iterate over the dataframe I am passing in the args
parameter and check logic against each row of the original dataframe, say df
, and return some value from dfOther
. So say I have a function like this:
def testFunc(row, dfOther):
for index, rowOther in dfOther.iterrows():
if row['A'] == rowOther[0] and row['B'] == rowOther[1]:
return dfOther.at[index, 'C']
df['OTHER'] = df.apply(testFunc, args=(dfOther), axis = 1)
My current understanding is that args
expects a Series object, and so if I actually run this we get the following error:
ValueError: The truth value of a DataFrame is ambiguous.
Use a.empty, a.bool(), a.item(), a.any() or a.all().
However before I wrote testFunc
which only passes in a single dataframe, I had actually written priorTestFunc
, which looks like this... And it works!
def priorTestFunc(row, dfOne, dfTwo):
for index, rowOne in dfOne.iterrows():
if row['A'] == rowOne[0] and row['B'] == rowOne[1]:
return dfTwo.at[index, 'C']
df['OTHER'] = df.apply(testFunc, args=(dfOne, dfTwo), axis = 1)
So to my dismay I have been coming into the habit of writing testFunc
like so and it has been working as intended:
def testFunc(row, dfOther, _):
for index, rowOther in dfOther.iterrows():
if row['A'] == rowOther[0] and row['B'] == rowOther[1]:
return dfOther.at[index, 'C']
df['OTHER'] = df.apply(testFunc, args=(dfOther, _), axis = 1)
I would really appreciate if someone could let me know why this would be the case and maybe errors that I will be prone to, or maybe another alternative for solving this kind of problem!!
EDIT: As requested by the comment: My dfs generally look like the below.. They will have two matching columns and will be returning a value from the dfOther.at[index, column]
I have considered pd.concat([dfOther, df])
however I will be running an algorithm testing conditions on df
and then updating it accordingly from specific values on dfOther
(which will also be updating) and I would like df
to be relatively neat, as opposed to making a multindex and throwing just about everything in it. Also I am aware df.iterrows
is in general slow, but these dataframes will be about 500 rows at the max, so scalability isn't really a massive concern for me at the moment.
df
Out[10]:
A B C
0 foo bur 6000
1 foo bur 7000
2 foo bur 8000
3 bar kek 9000
4 bar kek 10000
5 bar kek 11000
dfOther
Out[12]:
A B C
0 foo bur 1000
1 foo bur 2000
2 foo bur 3000
3 bar kek 4000
4 bar kek 5000
5 bar kek 6000
One can use apply() function in order to apply function to every row in given dataframe.
The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).
No, the apply() method doesn't contain an inplace parameter, unlike these pandas methods which have an inplace parameter: df. drop()
Pandas. apply allow the users to pass a function and apply it on every single value of the Pandas series. It comes as a huge improvement for the pandas library as this function helps to segregate data according to the conditions required due to which it is efficiently used in data science and machine learning.
The error is in this line:
File "C:\Anaconda3\envs\p2\lib\site-packages\pandas\core\frame.py", line 4017, in apply
if kwds or args and not isinstance(func, np.ufunc):
Here, if kwds or args
is checking whether the length of args
passed to apply
is greater than 0. It is a common way to check if an iterable is empty:
l = []
if l:
print("l is not empty!")
else:
print("l is empty!")
l is empty!
l = [1]
if l:
print("l is not empty!")
else:
print("l is empty!")
l is not empty!
If you had passed a tuple to df.apply
as args
, it would return True and there wouldn't be a problem. However, Python does not interpret (df) as a tuple:
type((df))
Out[39]: pandas.core.frame.DataFrame
It is just a DataFrame/variable inside parentheses. When you type if df
:
if df:
print("df is not empty")
Traceback (most recent call last):
File "<ipython-input-40-c86da5a5f1ee>", line 1, in <module>
if df:
File "C:\Anaconda3\envs\p2\lib\site-packages\pandas\core\generic.py", line 887, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You get the same error message. However, if you use a comma to indicate that it'a tuple, it works fine:
if (df, ):
print("tuple is not empty")
tuple is not empty
As a result, adding a comma to args=(dfOther)
by making it a singleton should solve the problem.
df['OTHER'] = df.apply(testFunc, args=(dfOther, ), axis = 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With