I want to replace all categories in a pandas dataframe with 'Others', if value count for that category is less than 10.
I am trying something like this.
df['variable'].where(df['variable'].apply(lambda x: x.map(x.value_counts()))<=10, "other")
But I get the following error:
AttributeError: 'str' object has no attribute 'map'
You can calculate the number of counts for each value via pd.Series.value_counts
and then identify counts below a cap. Then use pd.DataFrame.loc
with Boolean indexing:
counts = df['variable'].value_counts()
idx = counts[counts.lt(10)].index
df.loc[df['variable'].isin(idx), 'A'] = 'Others'
In general you should avoid apply
+ lambda
as this is non-vectorised and little more than a thinly veiled loop. Here's a work example with numeric data and added columns to demonstrate the logic:
np.random.seed(0)
arr = np.random.randint(0, 12, 100)
df = pd.DataFrame({'A': arr, 'B': arr})
counts = df['A'].value_counts()
idx = counts[counts.lt(10)].index
df['counts'] = df['A'].map(counts)
df.loc[df['A'].isin(idx), 'B'] = -1
print(df)
A B counts
0 5 -1 9
1 0 -1 9
2 3 3 14
3 11 -1 5
4 3 3 14
5 7 7 10
6 9 -1 9
7 3 3 14
8 5 -1 9
9 2 -1 5
10 4 4 13
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With