I have following pandas dataframe :
import pandas as pd 
from pandas import Series, DataFrame
data = DataFrame({'Qu1': ['apple', 'potato', 'cheese', 'banana', 'cheese', 'banana', 'cheese', 'potato', 'egg'],
              'Qu2': ['sausage', 'banana', 'apple', 'apple', 'apple', 'sausage', 'banana', 'banana', 'banana'],
              'Qu3': ['apple', 'potato', 'sausage', 'cheese', 'cheese', 'potato', 'cheese', 'potato', 'egg']})
I'd like to change values in columns Qu1,Qu2,Qu3 according to value_counts() when value count great or equal some number
For example for Qu1 column 
>>> pd.value_counts(data.Qu1) >= 2
cheese     True
potato     True
banana     True
apple     False
egg       False
I'd like to keep values cheese,potato,banana, because each value has at least two appearances.
From values apple and egg I'd like to create valueothers 
For column Qu2 no changes :
>>> pd.value_counts(data.Qu2) >= 2
banana     True
apple      True
sausage    True
The final result as in attached test_data
test_data = DataFrame({'Qu1': ['other', 'potato', 'cheese', 'banana', 'cheese', 'banana', 'cheese', 'potato', 'other'],
                  'Qu2': ['sausage', 'banana', 'apple', 'apple', 'apple', 'sausage', 'banana', 'banana', 'banana'],
                  'Qu3': ['other', 'potato', 'other', 'cheese', 'cheese', 'potato', 'cheese', 'potato', 'other']})
Thanks !
Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.
DataFrame. replace() function is used to replace values in column (one value with another value on all columns).
Using iloc() method to update the value of a row With the Python iloc() method, it is possible to change or update the value of a row/column by providing the index values of the same. In this example, we have updated the value of the rows 0, 1, 3 and 6 with respect to the first column i.e. 'Num' to 100.
I would create a dataframe of same shape where the corresponding entry is the value count:
data.apply(lambda x: x.map(x.value_counts()))
Out[229]: 
   Qu1  Qu2  Qu3
0    1    2    1
1    2    4    3
2    3    3    1
3    2    3    3
4    3    3    3
5    2    2    3
6    3    4    3
7    2    4    3
8    1    4    1
And, use the results in df.where to return "other" where the corresponding entry is smaller than 2:
data.where(data.apply(lambda x: x.map(x.value_counts()))>=2, "other")
      Qu1      Qu2     Qu3
0   other  sausage   other
1  potato   banana  potato
2  cheese    apple   other
3  banana    apple  cheese
4  cheese    apple  cheese
5  banana  sausage  potato
6  cheese   banana  cheese
7  potato   banana  potato
8   other   banana   other
                        You could:
value_counts = df.apply(lambda x: x.value_counts())
         Qu1  Qu2  Qu3
apple    1.0  3.0  1.0
banana   2.0  4.0  NaN
cheese   3.0  NaN  3.0
egg      1.0  NaN  1.0
potato   2.0  NaN  3.0
sausage  NaN  2.0  1.0
Then build a dictionary that will contain the replacements for each column:
import cycle
replacements = {}
for col, s in value_counts.items():
    if s[s<2].any():
        replacements[col] = dict(zip(s[s < 2].index.tolist(), cycle(['other'])))
replacements
{'Qu1': {'egg': 'other', 'apple': 'other'}, 'Qu3': {'egg': 'other', 'apple': 'other', 'sausage': 'other'}}
Use the dictionary to replace the values:
df.replace(replacements)
      Qu1      Qu2     Qu3
0   other  sausage   other
1  potato   banana  potato
2  cheese    apple   other
3  banana    apple  cheese
4  cheese    apple  cheese
5  banana  sausage  potato
6  cheese   banana  cheese
7  potato   banana  potato
8   other   banana   other
or wrap the loop in a dictionary comprehension:
from itertools import cycle
df.replace({col: dict(zip(s[s < 2].index.tolist(), cycle(['other']))) for col, s in value_counts.items() if s[s < 2].any()})
However, this is not only more cumbersome but also slower than using .where. Testing with 3,000 columns:
df = pd.concat([df for i in range(1000)], axis=1)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Columns: 3000 entries, Qu1 to Qu3
dtypes: object(3000)
Using .replace():
%%timeit
value_counts = df.apply(lambda x: x.value_counts())
df.replace({col: dict(zip(s[s < 2].index.tolist(), cycle(['other']))) for col, s in value_counts.items() if s[s < 2].any()})
1 loop, best of 3: 4.97 s per loop
vs .where():
%%timeit
df.where(df.apply(lambda x: x.map(x.value_counts()))>=2, "other")
1 loop, best of 3: 2.01 s per loop
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With