How can I categorize all columns in a data at once? (Make all values become High, Medium, Low)

Tags:

I am trying to convert all values in my dataset into categorical values, I want all numerical values to be categorized into low, average or high depends on their quantile values.

So if the value is lower than 25% of the series, it will be converted as "Low"

I have tried to use assign then applied a function that I provide:

def turn_into_categorical(row):
    quantile_level = [.25, .5, .75]
    for r in row:
        cut = refugees_T_F_V_P_full_data.r.quantile(quantile_level)
        if r >= cut[.75]:
            return "High"
        elif r >= cut[.25] and r < cut[0.75]:
            return "Average"
        else:
            return "Low"

refugees_T_F_V_P_full_data.apply(turn_into_categorical, axis = 1)

However, the code doesn't work well. I have also tried it via iterrows but I wonder if there's a faster way to do it?

Here is the data that I wanted to convert, all number excluding Year and Month should be categorized into Low, Medium, High, depending on their quantile value.

    Year  Month  Central Equatoria  Eastern Equatoria  Gogrial  Jonglei
0   2014     10                6.0                1.0      0.0      3.0   
1   2014     11                4.0                3.0      0.0     12.0   
2   2014     12                3.0                5.0      0.0     11.0   
3   2015      1                7.0                2.0      0.0      4.0   
4   2015      2                5.0                5.0      0.0     10.0   
5   2015      3                7.0                5.0      0.0      8.0   
6   2015      4                4.0                1.0      0.0      6.0   
7   2015      5                5.0                0.0      0.0      7.0   
8   2015      6                4.0                1.0      0.0      6.0   
9   2015      7               15.0                2.0      0.0      9.0   
10  2015      8               10.0                7.0      0.0      9.0   
11  2015      9               12.0                0.0      0.0      8.0   
12  2015     10               12.0                0.0      0.0      5.0   
13  2015     11                8.0                5.0      0.0     10.0   
14  2015     12                5.0                7.0      0.0      3.0

Expected Outcome: (Example)

    Year  Month  Central Equatoria  Eastern Equatoria  Gogrial  Jonglei
0   2014     10                High             Medium      Low      Medium  
1   2014     11                Low              Medium      Low     high

908

asked Jan 25 '19 17:01

CASA_DE_BS

2 Answers

Looks like you want pd.qcut, which does precisely that. From the docs:

Quantile-based discretization function

So you could apply pd.qcut along the columns of the dataframe starting from Central Equatoria onwards, specifying the quantiles you want to use to bin the Series with q = [0, 0.25, 0.75, 1.0]

df.loc[:,'Central Equatoria':].apply(lambda x: pd.qcut(x, q=[0, 0.25, 0.75, 1.0], 
                                    labels =['low','medium','high']) 
                                    if not x.nunique() == 1 else 'low'))

Output

       Central Equatoria Eastern Equatoria Gogrial Jonglei
0            medium              low     low     low
1               low           medium     low    high
2               low           medium     low    high
3            medium           medium     low     low
4            medium           medium     low    high
5            medium           medium     low  medium
6               low              low     low  medium
7            medium              low     low  medium
8               low              low     low  medium
9              high           medium     low  medium
10             high             high     low  medium
11             high              low     low  medium
12             high              low     low     low
13           medium           medium     low    high
14           medium             high     low     low

answered Oct 11 '22 01:10

yatu

One idea using pd.DataFrame.quantile with pd.Series.cut:

cats = ['Low', 'Medium', 'High']
quantiles = df.iloc[:, 2:].quantile([0, 0.25, 0.75, 1.0])

for col in df.iloc[:, 2:]:
    bin_edges = quantiles[col]
    # special case situations where all values are equal
    if bin_edges.nunique() == 1:
        df[col] = 'Low'
    else:
        df[col] = pd.cut(df[col], bins=bin_edges, labels=cats, include_lowest=True)

Result:

print(df)

    Year  Month CentralEquatoria EasternEquatoria Gogrial Jonglei
0   2014     10           Medium              Low     Low     Low
1   2014     11              Low           Medium     Low    High
2   2014     12              Low           Medium     Low    High
3   2015      1           Medium           Medium     Low     Low
4   2015      2           Medium           Medium     Low    High
5   2015      3           Medium           Medium     Low  Medium
6   2015      4              Low              Low     Low  Medium
7   2015      5           Medium              Low     Low  Medium
8   2015      6              Low              Low     Low  Medium
9   2015      7             High           Medium     Low  Medium
10  2015      8             High             High     Low  Medium
11  2015      9             High              Low     Low  Medium
12  2015     10             High              Low     Low     Low
13  2015     11           Medium           Medium     Low    High
14  2015     12           Medium             High     Low     Low

answered Oct 10 '22 23:10

jpp

Related questions
                            
                                How to tell if the next line should be indented when parsing python
                            
                                All combinations of set of dictionaries into K N-sized groups
                            
                                What would be Promise.race equivalent in Python asynchronous code?
                            
                                Dump intermediate results of multiprocessing job to filesystem and continue with processing later on
                            
                                How to fix "Unable to import module" error in AWS lambda
                            
                                Python 3.5 with OpenSSL v > 1 MAC OSX Mojave
                            
                                Explaining CNN (Keras) outputs with LIME
                            
                                Is it possible to get a confidence score on Spacy Named-entity recognition
                            
                                opencv-python : drawMatchesKnn() always return NULL
                            
                                How to get a reverse mapping in numpy in O(1)?
                            
                                Vector of custom struct in PyO3
                            
                                Why is the memory usage of a Python list smaller than expected?
                            
                                How to disable date interpolation in matplotlib?
                            
                                Python - extra keyword(?) and inheritance
                            
                                p-values from ridge regression in python
                            
                                Pivot a two-column dataframe
                            
                                Asynchronous REST API inside Discord.py
                            
                                pandas -- append data to series while increasing datetime index
                            
                                Pycharm: Java gateway process exited before sending its port number
                            
                                pylint warning on 'except Exception:'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I categorize all columns in a data at once? (Make all values become High, Medium, Low)

Tags:

python

pandas

dataframe

categorical-data

CASA_DE_BS

People also ask

2 Answers

yatu

jpp

Recent Activity

Donate For Us