Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: categorize column values by range

I have a dataframe, df like this

a  b  c
1  0  2
5  7  8
4  1  3
3  4  6
5  2  5
.......

Now I want to replace those data with following ranges

0-3 = 1
4-6 = 2
7-9 = 3

Column values are less than 10, so the range is within 0-9.

I want to replace the above dataframe values with the range categories and output should be like this

a  b  c
1  1  1
2  3  3
2  1  1
1  2  2
2  1  2
.......

So if any value in the df is within 0-3 should be replaced by 1, and if anything within 4-6 replace it by 2 and so on. How can I do this?

like image 709
asdfkjasdfjk Avatar asked Oct 28 '17 13:10

asdfkjasdfjk


1 Answers

Use pd.cut with apply i.e

df.apply(lambda x : pd.cut(x,[-1,3,6,9],labels=[1,2,3]))
   a  b  c
0  1  1  1
1  2  3  3
2  2  1  1
3  1  2  2
4  2  1  2

A non-apply based solution suggested by @coldspeed :

pd.DataFrame(pd.cut(df.values.reshape(-1,),[-1,3,6,9],labels=[1,2,3]).codes.reshape(df.shape)+1,columns=df.columns)

or

pd.DataFrame(pd.cut(np.hstack(df.values),[-1,3,6,9],labels=[1,2,3]).codes.reshape(df.shape)+1,columns=df.columns)
like image 198
Bharath Avatar answered Nov 06 '22 22:11

Bharath