Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fill multiple missing values with series based on index values

consider the pd.DataFrame df

df = pd.DataFrame([
        [np.nan, 1,      np.nan],
        [2,      np.nan, np.nan],
        [np.nan, np.nan, 3     ],
    ], list('abc'), list('xyz'))

df

enter image description here

and the pd.Series s

s = pd.Series([10, 20, 30], list('abc'))

How do I fill in missing values of df with the corresponding values of s based on the index of s and the index of df

For example:

  • df.loc['c', 'x'] is NaN
  • s.loc['c'] is 30

expected result
enter image description here

like image 788
piRSquared Avatar asked Nov 25 '25 10:11

piRSquared


2 Answers

pandas handles this on a column basis with no issues. Suppose we had a different s

s = pd.Series([10, 20, 30], ['x', 'y', 'z'])

then we could

df.fillna(s)

      x     y     z
a  10.0   1.0  30.0
b   2.0  20.0  30.0
c  10.0  20.0   3.0

But that's not what you want. Using your s

s = pd.Series([10, 20, 30], ['a', 'b', 'c'])

then df.fillna(s) does nothing. But we know that it works for columns, so:

df.T.fillna(s).T

      x     y     z
a  10.0   1.0  10.0
b   2.0  20.0  20.0
c  30.0  30.0   3.0
like image 69
Brian Avatar answered Nov 28 '25 01:11

Brian


Another way:

def fillnull(col):
    col[col.isnull()] = s[col.isnull()]
    return col

df.apply(fillnull)

Note that it's less efficient than @Brian's way (9ms per loop versus 1.5ms per loop on my computer)

like image 45
Julien Marrec Avatar answered Nov 27 '25 23:11

Julien Marrec



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!