Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Boolean .any() .all()

Tags:

python

pandas

I kept getting ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). when trying boolean tests with pandas. Not understanding what it said, I decided to try to figure it out.

However, I am totally confused at this point.

Here I create a dataframe of two variables, with a single data point shared between them (3):

In [75]:

import pandas as pd

df = pd.DataFrame()

df['x'] = [1,2,3]
df['y'] = [3,4,5]

Now I try all(is x less than y), which I translate to "are all the values of x less than y", and I get an answer that doesn't make sense.

In [79]:

if all(df['x'] < df['y']):
    print('True')
else:
    print('False')
True

Next I try any(is x less than y), which I translate to "is any value of x less than y", and I get another answer that doesn't make sense.

In [77]:

if any(df['x'] < df['y']):
    print('True')
else:
    print('False')
False

In short: what does any() and all() actually do?

like image 663
Anton Avatar asked Jan 06 '15 03:01

Anton


People also ask

How do you use boolean in pandas?

Pandas DataFrame bool() MethodThe bool() method returns a boolean value, True or False, reflecting the value of the DataFrame. This method will only work if the DataFrame has only 1 value, and that value must be either True or False, otherwise the bool() method will return an error.

What is Tolist () in pandas?

tolist()[source] Return a list of the values. These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period) Returns list.

How do I get every column except one pandas?

To select all columns except one column in Pandas DataFrame, we can use df. loc[:, df. columns != <column name>].


1 Answers

Pandas suggests you to use Series methods any() and all(), not Python in-build functions.

I don't quite understand the source of the strange output you have (I get True in both cases in Python 2.7 and Pandas 0.17.0). But try the following, it should work. This uses Series.any() and Series.all() methods.

import pandas as pd

df = pd.DataFrame()

df['x'] = [1,2,3]
df['y'] = [3,4,5]

print (df['x'] < df['y']).all() # more pythonic way of
print (df['x'] < df['y']).any() # doing the same thing

This should print:

True
True
like image 58
Sergey Antopolskiy Avatar answered Oct 09 '22 13:10

Sergey Antopolskiy