Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace column value with constant if value_count for that category is less than 10 python

Tags:

I want to replace all categories in a pandas dataframe with 'Others', if value count for that category is less than 10.

I am trying something like this.

df['variable'].where(df['variable'].apply(lambda x: x.map(x.value_counts()))<=10, "other")

But I get the following error:

AttributeError: 'str' object has no attribute 'map'
like image 710
Karan Gautam Avatar asked Sep 30 '18 22:09

Karan Gautam


1 Answers

You can calculate the number of counts for each value via pd.Series.value_counts and then identify counts below a cap. Then use pd.DataFrame.loc with Boolean indexing:

counts = df['variable'].value_counts()
idx = counts[counts.lt(10)].index

df.loc[df['variable'].isin(idx), 'A'] = 'Others'

In general you should avoid apply + lambda as this is non-vectorised and little more than a thinly veiled loop. Here's a work example with numeric data and added columns to demonstrate the logic:

np.random.seed(0)

arr = np.random.randint(0, 12, 100)
df = pd.DataFrame({'A': arr, 'B': arr})

counts = df['A'].value_counts()
idx = counts[counts.lt(10)].index

df['counts'] = df['A'].map(counts)
df.loc[df['A'].isin(idx), 'B'] = -1

print(df)

     A  B  counts
0    5 -1       9
1    0 -1       9
2    3  3      14
3   11 -1       5
4    3  3      14
5    7  7      10
6    9 -1       9
7    3  3      14
8    5 -1       9
9    2 -1       5
10   4  4      13
like image 77
jpp Avatar answered Oct 27 '22 01:10

jpp