Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing non-null values with column names

Given the following data frame:

import pandas as pd
d = pd.DataFrame({'a':[1,2,3],'b':[np.nan,5,6]})
d
    a   b
0   1   NaN
1   2   5.0
2   3   6.0

I would like to replace all non-null values with the column name.

Desired result:

    a   b
0   a   NaN
1   a   b
2   a   b

In reality, I have many columns.

Thanks in advance!

Update to answer from root: To perform this on a subset of columns:

d.loc[:,d.columns[3:]] = np.where(d.loc[:,d.columns[3:]].notnull(), d.loc[:,d.columns[3:]].columns, d.loc[:,d.columns[3:]])
like image 993
Dance Party2 Avatar asked Aug 22 '17 18:08

Dance Party2


2 Answers

Using numpy.where and notnull:

d[:] = np.where(d.notnull(), d.columns, d)

The resulting output:

   a    b
0  a  NaN
1  a    b
2  a    b

Edit

To select specific columns:

cols = d.columns[3:]  # or whatever Index/list-like of column names
d[cols] = np.where(d[cols].notnull(), cols, d[cols])
like image 60
root Avatar answered Oct 30 '22 10:10

root


I can think of one possibility using apply/transform:

In [1610]: d.transform(lambda x: np.where(x.isnull(), x, x.name))
Out[1610]: 
   a    b
0  a  nan
1  a    b
2  a    b

You could also use df.where:

In [1627]: d.where(d.isnull(), d.columns.values.repeat(len(d)).reshape(d.shape))
Out[1627]: 
   a    b
0  a  NaN
1  a    b
2  b    b
like image 40
cs95 Avatar answered Oct 30 '22 12:10

cs95