Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group numeric data into binned ranges

This is my original data:

    fi_diameter  ever_percent
0     -1.000000          0.00
1     -0.694212          0.00
2     -0.499782          0.00
3     -0.249749          0.00
4     -0.000000          0.00
5      0.249822          0.00
6      0.500218          0.00
7      0.749038          0.00
8      0.985645          0.00
9      1.251539          0.00
10     1.498179          0.00
11     2.122177          0.78
12     2.000000          3.70
13     2.251539          6.23
14     2.498179          8.66
15     2.746616         10.79
16     3.000000         12.13
17     3.251539         11.93
18     3.506353         10.78
19     3.756331          8.55
20     3.988504          5.90
21     4.237864          4.34
22     4.506353          2.89
23     5.011588          2.84
24     5.506353          1.88
25     5.965784          1.65
26     6.965784          2.49
27     7.965784          1.33
28     7.965784          3.13

I want to change my data to this type:

  1-2   4.48
  2-3  37.81
  3-4  37.16
  4-5   7.23
  5-6   6.37
  6-7   2.49
  7-8   4.46

I have try to use pandas of "groupby" ,but it not transform data to the like what i want it to be.

How do I get this output?

like image 576
Geologisted Avatar asked Dec 01 '25 06:12

Geologisted


1 Answers

  1. Filter out values under 1
  2. Use pd.cut and bin fi_diameter
  3. Use the banned labels to groupby and sum on ever_percent

df = df.query('fi_diameter >= 1')
grouper = pd.cut(
    df['fi_diameter'], 
    bins=[1, 2, 3, 4, 5, 6, 7, 8], 
    labels=['1-2', '2-3', '3-4', '4-5', '5-6', '6-7', '7-8']
)
df.groupby(grouper).ever_percent.sum().reset_index()

  fi_diameter  ever_percent
0         1-2          3.70
1         2-3         38.59
2         3-4         37.16
3         4-5          7.23
4         5-6          6.37
5         6-7          2.49
6         7-8          4.46

Here's how you generalise this for any upper bound.

mx = int(np.ceil(df['fi_diameter'].max()))
rn = np.arange(1, mx + 1)

df = df.query('fi_diameter >= 1')
grouper = pd.cut(
    df['fi_diameter'], 
    bins=rn, 
    labels=[
        str(i) + '-' + str(j) for i, j in zip(rn, rn[1:])
    ]
)    
df.groupby(grouper).ever_percent.sum().reset_index()

  fi_diameter  ever_percent
0         1-2          3.70
1         2-3         38.59
2         3-4         37.16
3         4-5          7.23
4         5-6          6.37
5         6-7          2.49
6         7-8          4.46
like image 69
cs95 Avatar answered Dec 05 '25 03:12

cs95



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!