how could I achieve something like np.where(df[varaible] in ['value1','value2'])

Question

Hi I want to change one categorical variable's value to other in the condition like ['value1','value2']

Here is my code:

random_sample['NAME_INCOME_TYPE_ind'] = np.where(random_sample['NAME_INCOME_TYPE'] in ['Maternity leave', 'Student']), 'Other')

I tried adding .any() in different position of this line of code, but it still does not resolve the error. ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

jpp · Accepted Answer

Use Categorical Data for categorical variables

When dealing with categoricals, you can replace categories with another rather than replacing strings. This has memory and performance benefits, as internally Pandas uses factorisation for categorical data.

df = pd.DataFrame({'NAME_INCOME_TYPE': ['Employed', 'Maternity leave',
                                        'Benefits', 'Student']})

# turn object series to categorical
label_col = 'NAME_INCOME_TYPE'
df[label_col] = df[label_col].astype('category')

# define others
others = ['Maternity leave', 'Student']
others_label = 'Other'

# add new category and replace existing categories
df[label_col] = df[label_col].cat.add_categories([others_label])
df[label_col] = df[label_col].replace(others, others_label)

print(df)

  NAME_INCOME_TYPE
0         Employed
1            Other
2         Benefits
3            Other

You can also write this more succinctly using method chaining:

# define others
others, others_label = ['Maternity leave', 'Student'], 'Other'

# turn to categorical, add category, then replace
df['NAME_INCOME_TYPE'] = df['NAME_INCOME_TYPE'].astype('category')\
                                               .cat.add_categories([others_label])\
                                               .replace(others, others_label)

how could I achieve something like np.where(df[varaible] in ['value1','value2'])

Tags:

python

pandas

numpy

series

categorical-data

Pumpkin C

1 Answers

Use Categorical Data for categorical variables

jpp

Recent Activity

Donate For Us

how could I achieve something like np.where(df[varaible] in ['value1','value2'])

Tags:

python

pandas

numpy

series

categorical-data

Pumpkin C

1 Answers

Use Categorical Data for categorical variables

jpp

Related questions

Recent Activity

Donate For Us