Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas equivalent of SQL case when statement to create new variable

Tags:

python

pandas

I have this df:

data = np.array([[np.nan, 0], [2, 0], [np.nan, 1]])
df = pd.DataFrame(data=data, columns = ['a', 'b'])

which looks like this:

     a    b
    --------
0   NaN  0.0
1   2.0  0.0
2   NaN  1.0

My goal is to create a third column "c" that has a value of 1 when column "a" is equal to NaN and column "b" is equal to 0. "c" would be 0 otherwise. The simple SQL case statement would be:

(CASE WHEN a IS NULL AND b = 0 THEN 1 ELSE 0 END) AS C

The desired output is this:

     a    b   c
    -----------
0   NaN  0.0  1
1   2.0  0.0  0
2   NaN  1.0  0

My (wrong) try:

df['c'] = np.where(df['a']==np.nan & df['b'] == 0, 1, 0)

Many thx.

like image 231
Antonio Serrano Avatar asked Jul 05 '17 08:07

Antonio Serrano


People also ask

Which pandas function allows you to manipulate data and create new variables?

#2 – Apply Function in Pandas It is one of the commonly used Pandas functions for manipulating a pandas dataframe and creating new variables.

How do I create a variable in pandas?

Just type the name of your dataframe, call the method, and then provide the name-value pairs for each new variable, separated by commas. What is this? Honestly, adding multiple variables to a Pandas dataframe is really easy.


3 Answers

For more control on conditions use np.select. Very similar to case when, can be used to scale up multiple outputs.

df['c'] = np.select(
[
    (df['a'].isnull() & (df['b'] == 0))
], 
[
    1
], 
default=0 )
like image 172
vishnu_tushaar Avatar answered Oct 18 '22 05:10

vishnu_tushaar


You're almost there, instead use np.where(df['a'].isnull() & (df['b'] == 0), 1, 0) for null check.

Alternatively,

In [258]: df['c'] = (df['a'].isnull() & (df['b'] == 0)).astype(int)

In [259]: df
Out[259]:
     a    b  c
0  NaN  0.0  1
1  2.0  0.0  0
2  NaN  1.0  0
like image 32
Zero Avatar answered Oct 18 '22 05:10

Zero


You cant check series value is NaN using np.nan instead use series.isnull()

Below code gives desired output:

df['c'] = np.where(df['a'].isnull() & np.array(df['b'] == 0),1,0)
like image 3
Akshay Kandul Avatar answered Oct 18 '22 05:10

Akshay Kandul