How to create categorical variable based on a numerical variable

Tags:

My DataFrame hase one column:

import pandas as pd
list=[1,1,4,5,6,6,30,20,80,90]
df=pd.DataFrame({'col1':list})

How can I add one more column 'col2' that would contain categorical information in reference to col1:

if col1 > 0 and col1 <= 10 then col2 = 'xxx'
if col1 > 10 and col1 <= 50 then col2 = 'yyy'
if col1 > 50 then col2 = 'zzz'

412

asked Sep 17 '15 15:09

Klausos Klausos

2 Answers

You could use pd.cut as follows:

df['col2'] = pd.cut(df['col1'], bins=[0, 10, 50, float('Inf')], labels=['xxx', 'yyy', 'zzz'])

Output:

   col1 col2
0     1  xxx
1     1  xxx
2     4  xxx
3     5  xxx
4     6  xxx
5     6  xxx
6    30  yyy
7    20  yyy
8    80  zzz
9    90  zzz

110

answered Sep 24 '22 14:09

DontDivideByZero

You could first create a new column col2, and update its values based on the conditions:

df['col2'] = 'zzz'
df.loc[(df['col1'] > 0) & (df['col1'] <= 10), 'col2'] = 'xxx'
df.loc[(df['col1'] > 10) & (df['col1'] <= 50), 'col2'] = 'yyy'
print df

Output:

   col1 col2
0     1  xxx
1     1  xxx
2     4  xxx
3     5  xxx
4     6  xxx
5     6  xxx
6    30  yyy
7    20  yyy
8    80  zzz
9    90  zzz

Alternatively, you can also apply a function based on the column col1:

def func(x):
    if 0 < x <= 10:
        return 'xxx'
    elif 10 < x <= 50:
        return 'yyy'
    return 'zzz'

df['col2'] = df['col1'].apply(func)

and this will result in the same output.

The apply approach should be preferred in this case as it is much faster:

%timeit run() # packaged to run the first approach
# 100 loops, best of 3: 3.28 ms per loop
%timeit df['col2'] = df['col1'].apply(func)
# 10000 loops, best of 3: 187 µs per loop

However, when the size of the DataFrame is large, the built-in vectorized operations (i.e. with the masking approach) might be faster.

answered Sep 20 '22 14:09

YS-L

Related questions
                            
                                zip() alternative for iterating through two iterables
                            
                                Python: How do you get an XML element's text content using xml.dom.minidom?
                            
                                Add to integers in a list
                            
                                xvfb run error in ubuntu 11.04
                            
                                Styling long chains in Python
                            
                                Arguments to cv2::imshow
                            
                                Applying map for partial argument
                            
                                Why does a python module act like a singleton?
                            
                                SQLAlchemy and UnicodeDecodeError
                            
                                Python list.remove() skips next element in list
                            
                                Does the `shell` in `shell=True` in subprocess means `bash`?
                            
                                Django -- Conditional Login Redirect
                            
                                Increase all of a lists values by an increment [duplicate]
                            
                                permanently remove directory from python path
                            
                                Error using cv2.equalizeHist
                            
                                Search for a value in a nested dictionary python
                            
                                How to make a list from a raw_input in python? [duplicate]
                            
                                How do I remove verbs, prepositions, conjunctions etc from my text? [closed]
                            
                                Sqlite - Use backticks (`) or double quotes (") with python
                            
                                Python argparse value range help message appearance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to create categorical variable based on a numerical variable

Tags:

python

pandas

dataframe

categorical-data

Klausos Klausos

People also ask

2 Answers

DontDivideByZero

YS-L

Recent Activity

Donate For Us