Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how could I achieve something like np.where(df[varaible] in ['value1','value2'])

Hi I want to change one categorical variable's value to other in the condition like ['value1','value2']

Here is my code:

random_sample['NAME_INCOME_TYPE_ind'] = np.where(random_sample['NAME_INCOME_TYPE'] in ['Maternity leave', 'Student']), 'Other')

I tried adding .any() in different position of this line of code, but it still does not resolve the error. ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

like image 726
Pumpkin C Avatar asked Oct 17 '22 08:10

Pumpkin C


1 Answers

Use Categorical Data for categorical variables

When dealing with categoricals, you can replace categories with another rather than replacing strings. This has memory and performance benefits, as internally Pandas uses factorisation for categorical data.

df = pd.DataFrame({'NAME_INCOME_TYPE': ['Employed', 'Maternity leave',
                                        'Benefits', 'Student']})

# turn object series to categorical
label_col = 'NAME_INCOME_TYPE'
df[label_col] = df[label_col].astype('category')

# define others
others = ['Maternity leave', 'Student']
others_label = 'Other'

# add new category and replace existing categories
df[label_col] = df[label_col].cat.add_categories([others_label])
df[label_col] = df[label_col].replace(others, others_label)

print(df)

  NAME_INCOME_TYPE
0         Employed
1            Other
2         Benefits
3            Other

You can also write this more succinctly using method chaining:

# define others
others, others_label = ['Maternity leave', 'Student'], 'Other'

# turn to categorical, add category, then replace
df['NAME_INCOME_TYPE'] = df['NAME_INCOME_TYPE'].astype('category')\
                                               .cat.add_categories([others_label])\
                                               .replace(others, others_label)
like image 85
jpp Avatar answered Oct 23 '22 09:10

jpp