I have this df:
data = np.array([[np.nan, 0], [2, 0], [np.nan, 1]])
df = pd.DataFrame(data=data, columns = ['a', 'b'])
which looks like this:
a b
--------
0 NaN 0.0
1 2.0 0.0
2 NaN 1.0
My goal is to create a third column "c" that has a value of 1 when column "a" is equal to NaN and column "b" is equal to 0. "c" would be 0 otherwise. The simple SQL case statement would be:
(CASE WHEN a IS NULL AND b = 0 THEN 1 ELSE 0 END) AS C
The desired output is this:
a b c
-----------
0 NaN 0.0 1
1 2.0 0.0 0
2 NaN 1.0 0
My (wrong) try:
df['c'] = np.where(df['a']==np.nan & df['b'] == 0, 1, 0)
Many thx.
#2 – Apply Function in Pandas It is one of the commonly used Pandas functions for manipulating a pandas dataframe and creating new variables.
Just type the name of your dataframe, call the method, and then provide the name-value pairs for each new variable, separated by commas. What is this? Honestly, adding multiple variables to a Pandas dataframe is really easy.
For more control on conditions use np.select. Very similar to case when, can be used to scale up multiple outputs.
df['c'] = np.select(
[
(df['a'].isnull() & (df['b'] == 0))
],
[
1
],
default=0 )
You're almost there, instead use np.where(df['a'].isnull() & (df['b'] == 0), 1, 0)
for null check.
Alternatively,
In [258]: df['c'] = (df['a'].isnull() & (df['b'] == 0)).astype(int)
In [259]: df
Out[259]:
a b c
0 NaN 0.0 1
1 2.0 0.0 0
2 NaN 1.0 0
You cant check series value is NaN using np.nan instead use series.isnull()
Below code gives desired output:
df['c'] = np.where(df['a'].isnull() & np.array(df['b'] == 0),1,0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With