I am using pandas qcut to split some data into 20 bins as part of data prep for training of a binary classification model like so: <pre class="prettyprint"><code>data['VAR_BIN'] = pd.qcut(cc_data[var], 20, labels=False) </code></pre> My question is, how can I apply the same binning logic derived from the qcut statement above to a new set of data, say for model validation purposes. Is there an easy way to do this? Thanks

You can do it by passing <code>retbins=True</code>. Consider the following DataFrame: <pre class="prettyprint"><code>import pandas as pd import numpy as np prng = np.random.RandomState(0) df = pd.DataFrame(prng.randn(100, 2), columns = ["A", "B"]) </code></pre> <code>pd.qcut(df["A"], 20, retbins=True, labels=False)</code> returns a tuple whose second element is the bins. So you can do: <pre class="prettyprint"><code>ser, bins = pd.qcut(df["A"], 20, retbins=True, labels=False) </code></pre> <code>ser</code> is the categorical series and <code>bins</code> are the break points. Now you can pass bins to <code>pd.cut</code> to apply the same grouping to the other column: <pre class="prettyprint"><code>pd.cut(df["B"], bins=bins, labels=False, include_lowest=True) Out[38]: 0 13 1 19 2 3 3 9 4 13 5 17 ... </code></pre>

Applying pandas qcut bins to new data

Tags:

python

pandas

I am using pandas qcut to split some data into 20 bins as part of data prep for training of a binary classification model like so:

data['VAR_BIN'] = pd.qcut(cc_data[var], 20, labels=False)

My question is, how can I apply the same binning logic derived from the qcut statement above to a new set of data, say for model validation purposes. Is there an easy way to do this?

Thanks

878

asked Jun 19 '16 10:06

GRN

1 Answers

You can do it by passing retbins=True.

Consider the following DataFrame:

import pandas as pd
import numpy as np
prng = np.random.RandomState(0)
df = pd.DataFrame(prng.randn(100, 2), columns = ["A", "B"])

pd.qcut(df["A"], 20, retbins=True, labels=False) returns a tuple whose second element is the bins. So you can do:

ser, bins = pd.qcut(df["A"], 20, retbins=True, labels=False)

ser is the categorical series and bins are the break points. Now you can pass bins to pd.cut to apply the same grouping to the other column:

pd.cut(df["B"], bins=bins, labels=False, include_lowest=True)
Out[38]: 
0     13
1     19
2      3
3      9
4     13
5     17
...

136

answered Sep 30 '22 03:09

ayhan

Related questions
                            
                                How to plot 2 seaborn lmplots side-by-side?
                            
                                Python Multiprocessing error: AttributeError: module '__main__' has no attribute '__spec__'
                            
                                Generate pretty diff html in Python
                            
                                How to send a mail directly to SMTP server without authentication?
                            
                                OpenCV putText() new line character
                            
                                Disable python import sorting in VSCode
                            
                                Python or IronPython
                            
                                nightmare with relative imports, how does pep 366 work?
                            
                                Appending turns my list to NoneType
                            
                                How to get the URL of a redirect with Python
                            
                                How can I retrieve the TLS/SSL peer certificate of a remote host using python?
                            
                                Selecting rows from a Pandas dataframe with a compound (hierarchical) index
                            
                                separate real and imaginary part of a complex number in python
                            
                                Different meanings of brackets in Python
                            
                                Is there a way to auto generate a __str__() implementation in python?
                            
                                How to use `GridSpec()` with `subplots()`
                            
                                Is there a dedicated way to get the number of items in a python `Enum`?
                            
                                How to use advanced activation layers in Keras?
                            
                                Pandas concat failing
                            
                                Tensorflow Different ways to Export and Run graph in C++

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With