how to use pd.cut() across columns of a data frame?

Tags:

python

pandas

>> df = pd.DataFrame(np.random.rand(10,4))
>> pd.cut(df,[0,0.5,1])

ValueError: Input array must be 1 dimensional

How can I get pd.cut() to work across all columns of a data frame?

573

asked Apr 29 '19 17:04

HappyPy

Video Answer

2 Answers

Use apply

df.apply(pd.cut, bins=[0,0.5,1])

You can specify the axis if you want to run across columns (axis=0) or rows (axis=1)

answered Oct 20 '22 19:10

rafaelc

If you don't mind a slightly different type of labeling, numpy.digitize provides a vectorized n-d solution.

np.digitize(df, bins=[0, 0.5, 1.0])

array([[2, 2, 2, 2],
       [1, 2, 2, 2],
       [1, 1, 2, 1],
       [2, 1, 2, 1],
       [2, 1, 2, 1],
       [2, 2, 2, 2],
       [1, 2, 1, 1],
       [2, 1, 2, 2],
       [2, 2, 1, 1],
       [2, 1, 2, 1]], dtype=int64)

The label 1 would correspond to 0-0.5, 2 to 0.5-1.0, etc.

Performance

df = pd.DataFrame(np.random.rand(1000, 1000))

%timeit pd.DataFrame(np.digitize(df, bins=[0, 0.5, 1.0]), columns=df.columns)
13.2 ms ± 36.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.apply(pd.cut, bins=[0, 0.5, 1])
3.11 s ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit pd.cut(df.stack(),[0,0.5,1]).unstack()
1.48 s ± 3.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

answered Oct 20 '22 17:10

user3483203

Related questions
                            
                                Is there a function in Python that shuffle data by data blocks?
                            
                                'ImportError: cannot import name cbook' when using PyCharm's Profiler
                            
                                Python- How to make an if statement between x and y? [duplicate]
                            
                                Handling None when adding numbers
                            
                                Difference between os.getlogin() and os.environ for getting Username
                            
                                Detecting outer most-edge of image and plotting based on it
                            
                                Does Django automatically detect the end user's timezone?
                            
                                Replace a value in a column by vlookup another dataframe only if the value exists
                            
                                Python3: How to use print() to print a string with quote?
                            
                                Split (explode) range in dataframe into multiple rows
                            
                                How to download file using Python? [duplicate]
                            
                                Removing an item from a list of lists based on each of the lists first element
                            
                                Python Global Variables - Not Defined?
                            
                                python3 for unit test: AttributeError: module '__main__' has no attribute "kernel..."
                            
                                Is it possible to use `element.click()` on Selenium with Chrome even on headless mode?
                            
                                io.StringIO vs open() in Python 3
                            
                                Python. Variable in while loop not updating.
                            
                                Min value in each column of a data frame excluding zeros
                            
                                Is there a way I can initialize dictionary values to 0 in python taking keys from a list? [duplicate]
                            
                                How to fix Field defines a relation with the model 'auth.User', which has been swapped out

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With