Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a Boolean column based on a condition

I have a dataframe of 11 columns and I want to create a new 0,1 column based on values in two of those columns.

I have already tried using np.where to create other columns but it doesnt work for this one.

train["location"] = np.where(3750901.5068 <= train["x"] <= 3770901.5068 
and -19268905.6133 <= train['y'] <= -19208905.6133, 1, 0)

I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

like image 765
principe Avatar asked Jan 01 '23 02:01

principe


2 Answers

You can use pandas.DataFrame.isin which will be a better solution. Also yes you need parenthesis and & instead of "and" . Documentation for pandas.DataFrame.isin https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html

For example:

df=pd.DataFrame({'a':[100,110,120,111,109],'b':[120,345,124,119,127]})
df['c']=np.where((df['a'].isin([100,111])) & (df['b'].isin([120,128])),1,0)

In your case it would be:

train["location"]=np.where(((train["x"].isin([3750901.5068,3770901.5069])) & (train["y"].isin([-19268905.6133,-19268905.6132])),1,0)
like image 127
Kartikeya Sharma Avatar answered Jan 04 '23 22:01

Kartikeya Sharma


I'm not sure you even need np.where here. To element-wise and two series, use & here instead of and. See: Logical operators for boolean indexing in Pandas

Also, 3750901.5068 <= train["x"] <= 3770901.5068 seems to be internally translated by python into (3750901.5068 <= train["x"]) and (train["x"] <= 3770901.5068), which again, has and and won't work. So you'll need to either explicitly split each one up into e.g. (3750901.5068 <= train["x"]) & (train["x"] <= 3770901.5068) or use Series.between e.g. train["x"].between(3750901.5068, 3770901.5068, inclusive=True). See: How to select rows in a DataFrame between two values, in Python Pandas?

You'll also need parentheses for the two arguments to &.

So the end result should look like

train["location"] = train["x"].between(3750901.5068, 3770901.5068, inclusive=True) & train['y'].between(-19268905.6133, -19208905.6133, inclusive=True)

This will give you a series of bools (Trues and Falses). These are already just 0s and 1s under-the-hood. If you really want 0s and 1s, you can pick a solution from here. For example, train.location = train.location.astype(int)

like image 37
Kevin Wang Avatar answered Jan 04 '23 23:01

Kevin Wang