Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fillna for category column with Series input does not work as expected

I have a category column which I want to fill with a Series. I tried this:

df = pd.DataFrame({'key': ['a', 'b'], 'value': ['c', np.nan]})
df['value'] = df['value'].astype("category")
df['value'] = df['value'].cat.add_categories(df['key'].unique())
print(df['value'].cat.categories)
df['value'] = df['value'].fillna(df['key'])
print(df)

Expected output:

Index(['c', 'a', 'b'], dtype='object')
  key value
0   a     c
1   b     b

Actual output:

Index(['c', 'a', 'b'], dtype='object')
  key value
0   a     a
1   b     b
like image 688
Hai Avatar asked Dec 29 '25 09:12

Hai


2 Answers

This appears to be a bug, but thankfully the workaround is quite simple. You will have to treat "value" as a string column when filling.

df['value'] = pd.Categorical(
    df.value.astype(object).fillna(df.key), categories=df.stack().unique())
df

  key value
0   a     c
1   b     b
like image 180
cs95 Avatar answered Jan 01 '26 01:01

cs95


From the doc , Categorical data will accept scalar not series , so you may need to convert it back to series

df.value.astype('object').fillna(df.key) # then convert to category again
Out[248]: 
0    c
1    b
Name: value, dtype: object

value : scalar Value to use to fill holes (e.g. 0)

like image 42
BENY Avatar answered Jan 01 '26 00:01

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!