Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Appending Boolean Column in Panda Dataframe

I am learning pandas and got stuck with this problem here.

I created a dataframe that tracks all users and the number of times they did something.

To better understand the problem I created this example:

import pandas as pd
data = [
    {'username': 'me',  'bought_apples': 2, 'bought_pears': 0},
    {'username': 'you', 'bought_apples': 1, 'bought_pears': 1}
]
df = pd.DataFrame(data)
df['bought_something'] = df['bought_apples'] > 0 or df['bought_pears'] > 0

In the last line I want to add a column that indicates if they user has bought something at all.

This error pops up:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I understand the point of ambiguity in panda's Series (also explained here) but I could not relate it to the problem.

Interestingly this works

df['bought_something'] = df['bought_apples'] > 0

Can anyone help me?

like image 989
linqu Avatar asked Jun 18 '15 10:06

linqu


People also ask

How do I append a column to a DataFrame in pandas?

In pandas you can add/append a new column to the existing DataFrame using DataFrame. insert() method, this method updates the existing DataFrame with a new column. DataFrame. assign() is also used to insert a new column however, this method returns a new Dataframe after adding a new column.

How do you append values in a DataFrame column?

append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value.

How do you write Boolean in pandas?

Pandas DataFrame bool() MethodThe bool() method returns a boolean value, True or False, reflecting the value of the DataFrame. This method will only work if the DataFrame has only 1 value, and that value must be either True or False, otherwise the bool() method will return an error.

How do I add a conditional column in pandas?

You can create a conditional column in pandas DataFrame by using np. where() , np. select() , DataFrame. map() , DataFrame.


2 Answers

You can call sum row-wise and compare if this is greater than 0:

In [105]:
df['bought_something'] = df[['bought_apples','bought_pears']].sum(axis=1) > 0
df

Out[105]:
   bought_apples  bought_pears username bought_something
0              2             0       me             True
1              1             1      you             True

Regarding your original attempt, the error message is telling you that it's ambiguous to compare a scalar with an array, if you want to or boolean conditions then you need to use the bit-wise operator | and wrap the conditions in parentheses due to operator precedence:

In [111]:
df['bought_something'] = ((df['bought_apples'] > 0) | (df['bought_pears'] > 0))
df

Out[111]:
   bought_apples  bought_pears username bought_something
0              2             0       me             True
1              1             1      you             True
like image 149
EdChum Avatar answered Oct 04 '22 18:10

EdChum


The reason for that error is you use 'or' to 'join' two boolean vectors instead of boolean scalar. That's why it says it is ambiguous.

like image 21
Jianxun Li Avatar answered Oct 04 '22 18:10

Jianxun Li