python bin data and return bin midpoint (maybe using pandas.cut and qcut)

Tags:

Can I make pandas cut/qcut function to return with bin endpoint or bin midpoint instead of a string of bin label?

Currently

pd.cut(pd.Series(np.arange(11)), bins = 5)

0     (-0.01, 2]
1     (-0.01, 2]
2     (-0.01, 2]
3         (2, 4]
4         (2, 4]
5         (4, 6]
6         (4, 6]
7         (6, 8]
8         (6, 8]
9        (8, 10]
10       (8, 10]
dtype: category

with category / string values. What I want is

with numerical values representing edge or midpoint of the bin.

304

asked Sep 23 '15 16:09

jf328

3 Answers

I noticed that a category has a mid property, so you can calculate the middle via an apply:

In [1]: import pandas as pd
   ...: import numpy as np
   ...: df = pd.DataFrame({"val":np.arange(11)})
   ...: df["bins"] = pd.cut(df["val"], bins = 5)
   ...: df["bin_centres"] = df["bins"].apply(lambda x: x.mid)
   ...: df
Out[1]:
    val          bins bin_centres
0     0  (-0.01, 2.0]       0.995
1     1  (-0.01, 2.0]       0.995
2     2  (-0.01, 2.0]       0.995
3     3    (2.0, 4.0]       3.000
4     4    (2.0, 4.0]       3.000
5     5    (4.0, 6.0]       5.000
6     6    (4.0, 6.0]       5.000
7     7    (6.0, 8.0]       7.000
8     8    (6.0, 8.0]       7.000
9     9   (8.0, 10.0]       9.000
10   10   (8.0, 10.0]       9.000

answered Oct 23 '22 21:10

erncyp

I see that this is an old post but I will take the liberty to answer it anyway.

It is now possible (ref @chrisb's answer) to access the endpoints for categorical intervals using left and right.

s = pd.cut(pd.Series(np.arange(11)), bins = 5)

mid = [(a.left + a.right)/2 for a in s]
Out[34]: [0.995, 0.995, 0.995, 3.0, 3.0, 5.0, 5.0, 7.0, 7.0, 9.0, 9.0]

Since intervals are open to the left and closed to the right, the 'first' interval (the one starting at 0), actually starts at -0.01. To get a midpoint using 0 as the left value you can do this

mid_alt = [(a.left + a.right)/2 if a.left != -0.01 else a.right/2 for a in s]
Out[35]: [1.0, 1.0, 1.0, 3.0, 3.0, 5.0, 5.0, 7.0, 7.0, 9.0, 9.0]

Or, you can say that the intervals are closed to the left and open to the right

t = pd.cut(pd.Series(np.arange(11)), bins = 5, right=False)
Out[38]: 
0       [0.0, 2.0)
1       [0.0, 2.0)
2       [2.0, 4.0)
3       [2.0, 4.0)
4       [4.0, 6.0)
5       [4.0, 6.0)
6       [6.0, 8.0)
7       [6.0, 8.0)
8     [8.0, 10.01)
9     [8.0, 10.01)
10    [8.0, 10.01)

But, as you see, you get the same problem at the last interval.

answered Oct 23 '22 22:10

mortysporty

There's a work-in-progress proposal for an 'IntervalIndex' that would make this type of operation very straightforward.

But for now, you can get the bins by passing the retbins argument and calculate the midpoints.

In [8]: s, bins = pd.cut(pd.Series(np.arange(11)), bins = 5, retbins=True)

In [11]: mid = [(a + b) /2 for a,b in zip(bins[:-1], bins[1:])]

In [13]: s.cat.rename_categories(mid)
Out[13]: 
0     0.995
1     0.995
2     0.995
3     3.000
4     3.000
5     5.000
6     5.000
7     7.000
8     7.000
9     9.000
10    9.000
dtype: category
Categories (5, float64): [0.995 < 3.000 < 5.000 < 7.000 < 9.000]

answered Oct 23 '22 21:10

chrisb

Related questions
                            
                                How can I get the matplotlib rgb color, given the colormap name, BoundryNorm, and 'c='?
                            
                                Interpreting scipy.stats.entropy values
                            
                                ttk.Treeview - Can't change row height
                            
                                Python: ImportError: /usr/local/lib/python2.7/lib-dynload/_io.so: undefined symbol: PyUnicodeUCS2_Replace
                            
                                In Python, why does a negative number raised to an even power remain negative? [duplicate]
                            
                                Using WN-Affect to detect emotion/mood of a string
                            
                                Maybe monad in Python with method chaining
                            
                                Django UnitTest with Mock
                            
                                Run python behave from python instead of command line
                            
                                How to generate a valid sample token with stripe?
                            
                                How do I configure mathjax for iPython notebooks?
                            
                                Numpy: Filtering rows by multiple conditions?
                            
                                How to verify a JWT using python PyJWT with a public PEM cert?
                            
                                How to add a screenshot to allure report with python?
                            
                                Continue until all iterators are done Python
                            
                                numpy: fill offset diagonal with different values
                            
                                Concatenate several np arrays in python
                            
                                Iterating through a unicode string in Python
                            
                                Scrapy - No module named mail.smtp
                            
                                Python integer formatting

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python bin data and return bin midpoint (maybe using pandas.cut and qcut)

Tags:

python

pandas

binning