I am learning pandas and got stuck with this problem here.
I created a dataframe that tracks all users and the number of times they did something.
To better understand the problem I created this example:
import pandas as pd
data = [
{'username': 'me', 'bought_apples': 2, 'bought_pears': 0},
{'username': 'you', 'bought_apples': 1, 'bought_pears': 1}
]
df = pd.DataFrame(data)
df['bought_something'] = df['bought_apples'] > 0 or df['bought_pears'] > 0
In the last line I want to add a column that indicates if they user has bought something at all.
This error pops up:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I understand the point of ambiguity in panda's Series (also explained here) but I could not relate it to the problem.
Interestingly this works
df['bought_something'] = df['bought_apples'] > 0
Can anyone help me?
In pandas you can add/append a new column to the existing DataFrame using DataFrame. insert() method, this method updates the existing DataFrame with a new column. DataFrame. assign() is also used to insert a new column however, this method returns a new Dataframe after adding a new column.
append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value.
Pandas DataFrame bool() MethodThe bool() method returns a boolean value, True or False, reflecting the value of the DataFrame. This method will only work if the DataFrame has only 1 value, and that value must be either True or False, otherwise the bool() method will return an error.
You can create a conditional column in pandas DataFrame by using np. where() , np. select() , DataFrame. map() , DataFrame.
You can call sum
row-wise and compare if this is greater than 0
:
In [105]:
df['bought_something'] = df[['bought_apples','bought_pears']].sum(axis=1) > 0
df
Out[105]:
bought_apples bought_pears username bought_something
0 2 0 me True
1 1 1 you True
Regarding your original attempt, the error message is telling you that it's ambiguous to compare a scalar with an array, if you want to or
boolean conditions then you need to use the bit-wise operator |
and wrap the conditions in parentheses due to operator precedence:
In [111]:
df['bought_something'] = ((df['bought_apples'] > 0) | (df['bought_pears'] > 0))
df
Out[111]:
bought_apples bought_pears username bought_something
0 2 0 me True
1 1 1 you True
The reason for that error is you use 'or' to 'join' two boolean vectors instead of boolean scalar. That's why it says it is ambiguous.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With