Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: Create new column out of other columns where value is not null

I have a data frame like this

----------------
RecID| A  |B
----------------
1    |NaN | x 
2    |y   | NaN 
3    |z   | NaN
4    |NaN | a 
5    |NaN | b 

And I want to create a new column, C, from A and B such that if A is null then fill with B and if B is null then fill with A:

----------------------
RecID|A   |B    |C 
----------------------
1    |NaN | x   |x
2    |y   | NaN |y 
3    |z   | NaN |z
4    |NaN | a   |a
5    |NaN | b   |b

Lastly, is there an efficient way to do this if I have more than two columns, e.g. I have columns A-Z and want create a new column A1 out of columns A-Z similar to above?

like image 591
swang16 Avatar asked May 03 '17 16:05

swang16


1 Answers

pandas
lookup
This is the generalizable solution OP was looking for and will work across an arbitrary number of columns.

lookup = df.loc[:, 'A':'B'].notnull().idxmax(1)
df.assign(A1=df.lookup(lookup.index, lookup.values))

   RecID    A    B A1
0      1  NaN    x  x
1      2    y  NaN  y
2      3    z  NaN  z
3      4  NaN    a  a
4      5  NaN    b  b

fillna

df.assign(C=df.A.fillna(df.B))

   RecID    A    B  C
0      1  NaN    x  x
1      2    y  NaN  y
2      3    z  NaN  z
3      4  NaN    a  a
4      5  NaN    b  b

mask

df.assign(C=df.A.mask(df.A.isnull(), df.B))

   RecID    A    B  C
0      1  NaN    x  x
1      2    y  NaN  y
2      3    z  NaN  z
3      4  NaN    a  a
4      5  NaN    b  b

combine_first

df.assign(C=df.A.combine_first(df.B))

   RecID    A    B  C
0      1  NaN    x  x
1      2    y  NaN  y
2      3    z  NaN  z
3      4  NaN    a  a
4      5  NaN    b  b

numpy
np.where

df.assign(C=np.where(df.A.notnull(), df.A, df.B))

   RecID    A    B  C
0      1  NaN    x  x
1      2    y  NaN  y
2      3    z  NaN  z
3      4  NaN    a  a
4      5  NaN    b  b
like image 61
piRSquared Avatar answered Oct 05 '22 22:10

piRSquared