I have a category column which I want to fill with a Series. I tried this:
df = pd.DataFrame({'key': ['a', 'b'], 'value': ['c', np.nan]})
df['value'] = df['value'].astype("category")
df['value'] = df['value'].cat.add_categories(df['key'].unique())
print(df['value'].cat.categories)
df['value'] = df['value'].fillna(df['key'])
print(df)
Expected output:
Index(['c', 'a', 'b'], dtype='object')
key value
0 a c
1 b b
Actual output:
Index(['c', 'a', 'b'], dtype='object')
key value
0 a a
1 b b
This appears to be a bug, but thankfully the workaround is quite simple. You will have to treat "value" as a string column when filling.
df['value'] = pd.Categorical(
df.value.astype(object).fillna(df.key), categories=df.stack().unique())
df
key value
0 a c
1 b b
From the doc , Categorical data will accept scalar not series , so you may need to convert it back to series
df.value.astype('object').fillna(df.key) # then convert to category again
Out[248]:
0 c
1 b
Name: value, dtype: object
value : scalar Value to use to fill holes (e.g. 0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With