Grouping numerical values in pandas

Question

In my Dataframe I have one column with numeric values, let say - distance. I want to find out which group of distance (range) have the biggest number of records (rows).

Doing simple: df.distance.count_values() returns:

74         1
90         1
94         1
893        1
889        1
885        1
877        1
833        1
122        1
545        1

What I want to achieve is something like buckets from histogram, so I am expecting output like this:

900         4 #all values < 900 and > 850
100         3
150         1
550         1
850         1

The one approach I've figured out so far, but I don't think is the best and most optimal one is just find max and min values, divide by my step (50 in this case) and then do loop checking all the values and assigning to appropriate group.

Is there any other, better approach for that?

DrTRD · Accepted Answer

I'd suggest doing the following, assuming your value column is labeled val

import numpy as np
df['bin'] = df['val'].apply(lambda x: 50*np.floor(x/50))

The result is the following:

df.groupby('bin')['val'].count()

sebap123 · Answer

Thanks to EdChum suggestion and based on this example I've figured out, the best way (at least for me) is to do something like this:

import numpy as np

step = 50

#...

max_val = df.distance.max()
bins = list(range(0,int(np.ceil(max_val/step))*step+step,step))
clusters = pd.cut(df.distance,bins,labels=bins[1:])

Grouping numerical values in pandas

Tags:

python

pandas

sebap123

2 Answers

DrTRD

sebap123

Recent Activity

Donate For Us

Grouping numerical values in pandas

Tags:

python

pandas

sebap123

2 Answers

DrTRD

sebap123

Related questions

Recent Activity

Donate For Us