Given the following data frame:
import pandas as pd
d = pd.DataFrame({'a':[1,2,3],'b':[np.nan,5,6]})
d
a b
0 1 NaN
1 2 5.0
2 3 6.0
I would like to replace all non-null values with the column name.
Desired result:
a b
0 a NaN
1 a b
2 a b
In reality, I have many columns.
Thanks in advance!
Update to answer from root: To perform this on a subset of columns:
d.loc[:,d.columns[3:]] = np.where(d.loc[:,d.columns[3:]].notnull(), d.loc[:,d.columns[3:]].columns, d.loc[:,d.columns[3:]])
Using numpy.where
and notnull
:
d[:] = np.where(d.notnull(), d.columns, d)
The resulting output:
a b
0 a NaN
1 a b
2 a b
Edit
To select specific columns:
cols = d.columns[3:] # or whatever Index/list-like of column names
d[cols] = np.where(d[cols].notnull(), cols, d[cols])
I can think of one possibility using apply/transform
:
In [1610]: d.transform(lambda x: np.where(x.isnull(), x, x.name))
Out[1610]:
a b
0 a nan
1 a b
2 a b
You could also use df.where
:
In [1627]: d.where(d.isnull(), d.columns.values.repeat(len(d)).reshape(d.shape))
Out[1627]:
a b
0 a NaN
1 a b
2 b b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With