Long time user of this site but first time asking a question! Thanks to all of the benevolent users who have been answering questions for ages :) I have been using <code>df.apply</code> lately and ideally want to pass a dataframe into the <code>args</code> parameter to look something like so: <code> df.apply(testFunc, args=(dfOther), axis = 1)</code> My ultimate goal is to iterate over the dataframe I am passing in the <code>args</code> parameter and check logic against each row of the original dataframe, say <code> df </code>, and return some value from <code> dfOther </code>. So say I have a function like this: <pre class="prettyprint"><code>def testFunc(row, dfOther): for index, rowOther in dfOther.iterrows(): if row['A'] == rowOther[0] and row['B'] == rowOther[1]: return dfOther.at[index, 'C'] df['OTHER'] = df.apply(testFunc, args=(dfOther), axis = 1) </code></pre> My current understanding is that <code>args</code> expects a Series object, and so if I actually run this we get the following error: <pre class="prettyprint"><code>ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). </code></pre> However before I wrote <code>testFunc</code> which only passes in a single dataframe, I had actually written <code>priorTestFunc</code>, which looks like this... And it works! <pre class="prettyprint"><code>def priorTestFunc(row, dfOne, dfTwo): for index, rowOne in dfOne.iterrows(): if row['A'] == rowOne[0] and row['B'] == rowOne[1]: return dfTwo.at[index, 'C'] df['OTHER'] = df.apply(testFunc, args=(dfOne, dfTwo), axis = 1) </code></pre> So to my dismay I have been coming into the habit of writing <code>testFunc</code> like so and it has been working as intended: <pre class="prettyprint"><code>def testFunc(row, dfOther, _): for index, rowOther in dfOther.iterrows(): if row['A'] == rowOther[0] and row['B'] == rowOther[1]: return dfOther.at[index, 'C'] df['OTHER'] = df.apply(testFunc, args=(dfOther, _), axis = 1) </code></pre> I would really appreciate if someone could let me know why this would be the case and maybe errors that I will be prone to, or maybe another alternative for solving this kind of problem!! EDIT: As requested by the comment: My dfs generally look like the below.. They will have two matching columns and will be returning a value from the <code>dfOther.at[index, column]</code> I have considered <code>pd.concat([dfOther, df])</code> however I will be running an algorithm testing conditions on <code>df</code> and then updating it accordingly from specific values on <code>dfOther</code>(which will also be updating) and I would like <code> df</code> to be relatively neat, as opposed to making a multindex and throwing just about everything in it. Also I am aware <code>df.iterrows</code> is in general slow, but these dataframes will be about 500 rows at the max, so scalability isn't really a massive concern for me at the moment. <pre class="prettyprint"><code>df Out[10]: A B C 0 foo bur 6000 1 foo bur 7000 2 foo bur 8000 3 bar kek 9000 4 bar kek 10000 5 bar kek 11000 dfOther Out[12]: A B C 0 foo bur 1000 1 foo bur 2000 2 foo bur 3000 3 bar kek 4000 4 bar kek 5000 5 bar kek 6000 </code></pre>

The error is in this line: <pre class="prettyprint"><code> File "C:\Anaconda3\envs\p2\lib\site-packages\pandas\core\frame.py", line 4017, in apply if kwds or args and not isinstance(func, np.ufunc): </code></pre> Here, <code>if kwds or args</code> is checking whether the length of <code>args</code> passed to <code>apply</code> is greater than 0. It is a common way to check if an iterable is empty: <pre class="prettyprint"><code>l = [] if l: print("l is not empty!") else: print("l is empty!") </code></pre> <blockquote> <code>l is empty!</code> </blockquote> <pre class="prettyprint"><code>l = [1] if l: print("l is not empty!") else: print("l is empty!") </code></pre> <blockquote> <code>l is not empty!</code> </blockquote> If you had passed a tuple to <code>df.apply</code> as <code>args</code>, it would return True and there wouldn't be a problem. However, Python does not interpret (df) as a tuple: <pre class="prettyprint"><code>type((df)) Out[39]: pandas.core.frame.DataFrame </code></pre> It is just a DataFrame/variable inside parentheses. When you type <code>if df</code>: <pre class="prettyprint"><code>if df: print("df is not empty") Traceback (most recent call last): File "<ipython-input-40-c86da5a5f1ee>", line 1, in <module> if df: File "C:\Anaconda3\envs\p2\lib\site-packages\pandas\core\generic.py", line 887, in __nonzero__ .format(self.__class__.__name__)) ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). </code></pre> You get the same error message. However, if you use a comma to indicate that it'a tuple, it works fine: <pre class="prettyprint"><code>if (df, ): print("tuple is not empty") tuple is not empty </code></pre> As a result, adding a comma to <code>args=(dfOther)</code> by making it a singleton should solve the problem. <pre class="prettyprint"><code>df['OTHER'] = df.apply(testFunc, args=(dfOther, ), axis = 1) </code></pre>

python pandas: passing in dataframe to df.apply

Tags:

python

pandas

dataframe

Long time user of this site but first time asking a question! Thanks to all of the benevolent users who have been answering questions for ages :)

I have been using df.apply lately and ideally want to pass a dataframe into the args parameter to look something like so: df.apply(testFunc, args=(dfOther), axis = 1)

My ultimate goal is to iterate over the dataframe I am passing in the args parameter and check logic against each row of the original dataframe, say df , and return some value from dfOther . So say I have a function like this:

def testFunc(row, dfOther):
    for index, rowOther in dfOther.iterrows():
        if row['A'] == rowOther[0] and row['B'] == rowOther[1]:
            return dfOther.at[index, 'C']

df['OTHER'] = df.apply(testFunc, args=(dfOther), axis = 1)

My current understanding is that args expects a Series object, and so if I actually run this we get the following error:

ValueError: The truth value of a DataFrame is ambiguous. 
Use a.empty, a.bool(), a.item(), a.any() or a.all().

However before I wrote testFunc which only passes in a single dataframe, I had actually written priorTestFunc, which looks like this... And it works!

def priorTestFunc(row, dfOne, dfTwo):
    for index, rowOne in dfOne.iterrows():
        if row['A'] == rowOne[0] and row['B'] == rowOne[1]:
            return dfTwo.at[index, 'C']

df['OTHER'] = df.apply(testFunc, args=(dfOne, dfTwo), axis = 1)

So to my dismay I have been coming into the habit of writing testFunc like so and it has been working as intended:

def testFunc(row, dfOther, _):
    for index, rowOther in dfOther.iterrows():
        if row['A'] == rowOther[0] and row['B'] == rowOther[1]:
            return dfOther.at[index, 'C']

df['OTHER'] = df.apply(testFunc, args=(dfOther, _), axis = 1)

I would really appreciate if someone could let me know why this would be the case and maybe errors that I will be prone to, or maybe another alternative for solving this kind of problem!!

EDIT: As requested by the comment: My dfs generally look like the below.. They will have two matching columns and will be returning a value from the dfOther.at[index, column] I have considered pd.concat([dfOther, df]) however I will be running an algorithm testing conditions on df and then updating it accordingly from specific values on dfOther(which will also be updating) and I would like df to be relatively neat, as opposed to making a multindex and throwing just about everything in it. Also I am aware df.iterrows is in general slow, but these dataframes will be about 500 rows at the max, so scalability isn't really a massive concern for me at the moment.

df
Out[10]: 
    A    B      C
0  foo  bur   6000
1  foo  bur   7000
2  foo  bur   8000
3  bar  kek   9000
4  bar  kek  10000
5  bar  kek  11000

dfOther
Out[12]: 
    A    B      C
0  foo  bur   1000
1  foo  bur   2000
2  foo  bur   3000
3  bar  kek   4000
4  bar  kek   5000
5  bar  kek   6000

690

asked Jun 04 '16 12:06

jboxxx

1 Answers

The error is in this line:

  File "C:\Anaconda3\envs\p2\lib\site-packages\pandas\core\frame.py", line 4017, in apply
    if kwds or args and not isinstance(func, np.ufunc):

Here, if kwds or args is checking whether the length of args passed to apply is greater than 0. It is a common way to check if an iterable is empty:

l = []

if l:
    print("l is not empty!")
else:
    print("l is empty!")

l is empty!

l = [1]

if l:
    print("l is not empty!")
else:
    print("l is empty!")

l is not empty!

If you had passed a tuple to df.apply as args, it would return True and there wouldn't be a problem. However, Python does not interpret (df) as a tuple:

type((df))
Out[39]: pandas.core.frame.DataFrame

It is just a DataFrame/variable inside parentheses. When you type if df:

if df:
    print("df is not empty")

Traceback (most recent call last):

  File "<ipython-input-40-c86da5a5f1ee>", line 1, in <module>
    if df:

  File "C:\Anaconda3\envs\p2\lib\site-packages\pandas\core\generic.py", line 887, in __nonzero__
    .format(self.__class__.__name__))

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You get the same error message. However, if you use a comma to indicate that it'a tuple, it works fine:

if (df, ):
    print("tuple is not empty")

tuple is not empty

As a result, adding a comma to args=(dfOther) by making it a singleton should solve the problem.

df['OTHER'] = df.apply(testFunc, args=(dfOther, ), axis = 1)

172

answered Oct 19 '22 16:10

ayhan

Related questions
                            
                                How to mock in python and still allow the actual code of mocked function to execute
                            
                                How do I write this equation in Python?
                            
                                Deleting variable does not erase its memory from RAM memory
                            
                                How can I find the best fuzzy string match?
                            
                                Calculate the closest colourblind-friendly colour?
                            
                                How to remove a column from a structured numpy array *without copying it*?
                            
                                Keras, best way to save state when optimizing
                            
                                Customize JSON output in Django Rest Framework GET call
                            
                                Django queryset group by and count distincts
                            
                                Why is numba throwing an error regarding numpy methods when (nopython=True)?
                            
                                Cross-platform, safe to use command line string separator
                            
                                Python - List comprehension with tuple unpack
                            
                                Python3 append value to array but only if it's not None
                            
                                How do I actually get dask to compute a list of delayed or dask-container-based results?
                            
                                yaml anchors definitions loading in PyYAML
                            
                                OpenCV error: (-215) scn == 3 || scn == 4 in function ipp_cvtColor
                            
                                Why is creating a masked numpy array so slow with mask=None or mask=0
                            
                                Python Pandas - something like ISIN but "contains" vs "exact" match
                            
                                3D parametric curve in Matplotlib does not respect zorder. Workaround?
                            
                                Multiple Legends on Pandas df.plot subplots?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With