How do you create a new Bin/Bucket Variable using pd.qut in python?
This might seem elementary to experienced users but I was not super clear on this and it was surprisingly unintuitive to search for on stack overflow/google. Some thorough searching yielded this (Assignment of qcut as new column) but it didn't quite answer my question because it didn't take the last step and put everything into bins (i.e. 1,2,...).
Pandas: Data Manipulation - qcut() functionDiscretize variable into equal-sized buckets based on rank or based on sample quantiles. For example 1000 values for 10 quantiles would produce a Categorical object indicating quantile membership for each data point.
Use pd. cut() for binning data based on the range of possible values. Use pd. qcut() for binning data based on the actual distribution of values.
The major distinction is that qcut will calculate the size of each bin in order to make sure the distribution of data in the bins is equal. In other words, all bins will have (roughly) the same number of observations but the bin range will vary. On the other hand, cut is used to specifically define the bin edges.
In Pandas 0.15.0 or newer, pd.qcut
will return a Series, not a Categorical if the input is a Series (as it is, in your case) or if labels=False
. If you set labels=False
, then qcut
will return a Series with the integer indicators of the bins as values.
So to future-proof your code, you could use
data3['bins_spd'] = pd.qcut(data3['spd_pct'], 5, labels=False)
or, pass a NumPy array to pd.qcut
so you get a Categorical as the return value.
Note that the Categorical attribute labels
is deprecated. Use codes
instead:
data3['bins_spd'] = pd.qcut(data3['spd_pct'].values, 5).codes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With