Hi I want to change one categorical variable's value to other
in the condition like ['value1','value2']
Here is my code:
random_sample['NAME_INCOME_TYPE_ind'] = np.where(random_sample['NAME_INCOME_TYPE'] in ['Maternity leave', 'Student']), 'Other')
I tried adding .any()
in different position of this line of code, but it still does not resolve the error.
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
When dealing with categoricals, you can replace categories with another rather than replacing strings. This has memory and performance benefits, as internally Pandas uses factorisation for categorical data.
df = pd.DataFrame({'NAME_INCOME_TYPE': ['Employed', 'Maternity leave',
'Benefits', 'Student']})
# turn object series to categorical
label_col = 'NAME_INCOME_TYPE'
df[label_col] = df[label_col].astype('category')
# define others
others = ['Maternity leave', 'Student']
others_label = 'Other'
# add new category and replace existing categories
df[label_col] = df[label_col].cat.add_categories([others_label])
df[label_col] = df[label_col].replace(others, others_label)
print(df)
NAME_INCOME_TYPE
0 Employed
1 Other
2 Benefits
3 Other
You can also write this more succinctly using method chaining:
# define others
others, others_label = ['Maternity leave', 'Student'], 'Other'
# turn to categorical, add category, then replace
df['NAME_INCOME_TYPE'] = df['NAME_INCOME_TYPE'].astype('category')\
.cat.add_categories([others_label])\
.replace(others, others_label)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With