I know this question has been asked before, however, when I am trying to do an if
statement and I am getting an error. I looked at this link , but did not help much in my case. My dfs
is a list of DataFrames.
I am trying the following,
for i in dfs:
if (i['var1'] < 3.000):
print(i)
Gives the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
AND I tried the following and getting the same error.
for i,j in enumerate(dfs):
if (j['var1'] < 3.000):
print(i)
My var1
data type is float32
. I am not using any other logical
operators and &
or |
. In the above link it seemed to be because of using logical operators. Why do I get ValueError
?
ValueError in Python is raised when a user gives an invalid value to a function but is of a valid argument. It usually occurs in mathematical operations that will require a certain kind of value, even when the value is the correct argument. Imagine telling Python to take the square root of a negative integer.
In order to check null values in Pandas DataFrame, we use isnull() function this function return dataframe of Boolean values which are True for NaN values.
For 1 or 0 elements ValueError: The truth value of an array with more than one element is ambiguous. If the number of elements is one, the value of the element is evaluated as a bool value. For example, if the element is an integer int , it is False if it is 0 and True otherwise.
notnull is a pandas function that will examine one or multiple values to validate that they are not null. In Python, null values are reflected as NaN (not a number) or None to signify no data present. . notnull will return False if either NaN or None is detected. If these values are not present, it will return True.
Here is a small demo, which shows why this is happenning:
In [131]: df = pd.DataFrame(np.random.randint(0,20,(5,2)), columns=list('AB'))
In [132]: df
Out[132]:
A B
0 3 11
1 0 16
2 16 1
3 2 11
4 18 15
In [133]: res = df['A'] > 10
In [134]: res
Out[134]:
0 False
1 False
2 True
3 False
4 True
Name: A, dtype: bool
when we try to check whether such Series is True
- Pandas doesn't know what to do:
In [135]: if res:
...: print(df)
...:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
skipped
...
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Workarounds:
we can decide how to treat Series of boolean values - for example if
should return True
if all values are True
:
In [136]: res.all()
Out[136]: False
or when at least one value is True:
In [137]: res.any()
Out[137]: True
In [138]: if res.any():
...: print(df)
...:
A B
0 3 11
1 0 16
2 16 1
3 2 11
4 18 15
Currently, you're selecting the entire series for comparison. To get an individual value from the series, you'll want to use something along the lines of:
for i in dfs:
if (i['var1'].iloc[0] < 3.000):
print(i)
To compare each of the individual elements you can use series.iteritems (documentation is sparse on this one) like so:
for i in dfs:
for _, v in i['var1'].iteritems():
if v < 3.000:
print(v)
The better solution here for most cases is to select a subset of the dataframe to use for whatever you need, like so:
for i in dfs:
subset = i[i['var1'] < 3.000]
# do something with the subset
Performance in pandas is much faster on large dataframes when using series operations instead of iterating over individual values. For more detail, you can check out the pandas documentation on selection.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With