Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas, conditional column assignment based on column values

How can I have conditional assignment in pandas by based on the values of two columns? Conceptually something like the following:

Column_D = Column_B / (Column_B + Column_C) if Column_C is not null else Column_C

Concrete example:

import pandas as pd
import numpy as np
df = pd.DataFrame({'b': [2,np.nan,4,2,np.nan], 'c':[np.nan,1,2,np.nan,np.nan]})


     b    c
0  2.0  NaN
1  NaN  1.0
2  4.0  2.0
3  2.0  NaN
4  NaN  NaN

I want to have a new column d whose result is division of column b by sum of b and c, if c is not null, otherwise the value should be the value at column c. Something conceptually like the following:

df['d'] = df['b']/(df['b']+df['c']) if not df['c'].isnull() else df['c']

desired result:

     b    c         d
0  2.0  NaN       NaN
1  NaN  1.0       1.0
2  4.0  2.0       0.66
3  2.0  NaN       NaN
4  NaN  NaN       NaN

How can I achieve this?

like image 959
CentAu Avatar asked Jul 28 '16 17:07

CentAu


People also ask

How do I get the value of a column based on another column value?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression. The blow example returns a Courses column where the Fee column value matches with 25000.

How do I count values in one column based on another column in pandas?

We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.


1 Answers

try this (if you want to have your desired result set - checking b column):

In [30]: df['d'] = np.where(df.b.notnull(), df.b/(df.b+df.c), df.c)

In [31]: df
Out[31]:
     b    c         d
0  2.0  NaN       NaN
1  NaN  1.0  1.000000
2  4.0  2.0  0.666667
3  2.0  NaN       NaN
4  NaN  NaN       NaN

or this, checking c column:

In [32]: df['d'] = np.where(df.c.notnull(), df.b/(df.b+df.c), df.c)

In [33]: df
Out[33]:
     b    c         d
0  2.0  NaN       NaN
1  NaN  1.0       NaN
2  4.0  2.0  0.666667
3  2.0  NaN       NaN
4  NaN  NaN       NaN
like image 133
MaxU - stop WAR against UA Avatar answered Oct 02 '22 17:10

MaxU - stop WAR against UA