Better binning in pandas [duplicate]

Tags:

I've got a data frame and want to filter or bin by a range of values and then get the counts of values in each bin.

Currently, I'm doing this:

x = 5
y = 17
z = 33
filter_values = [x, y, z]
filtered_a = df[df.filtercol <= x]
a_count = filtered_a.filtercol.count()

filtered_b = df[df.filtercol > x]
filtered_b = filtered_b[filtered_b <= y]
b_count = filtered_b.filtercol.count()

filtered_c = df[df.filtercol > y]
c_count = filtered_c.filtercol.count()

But is there a more concise way to accomplish the same thing?

737

asked Jan 22 '13 03:01

monkut

1 Answers

Perhaps you are looking for pandas.cut:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(50), columns=['filtercol'])
filter_values = [0, 5, 17, 33]   
out = pd.cut(df.filtercol, bins=filter_values)
counts = pd.value_counts(out)
# counts is a Series
print(counts)

yields

(17, 33]    16
(5, 17]     12
(0, 5]       5

To reorder the result so the bin ranges appear in order, you could use

counts.sort_index()

which yields

(0, 5]       5
(5, 17]     12
(17, 33]    16

Thanks to nivniv and InLaw for this improvement.

unutbu

Related questions
                            
                                Django shell mode in docker
                            
                                Concatenate multiple files into a single file object without creating a new file
                            
                                Can't upload to PyPi with Twine
                            
                                AttributeError: 'Graph' object has no attribute 'node'
                            
                                UsageError: Line magic function `%tensorflow_version` not found
                            
                                Where can I find the time and space complexity of the built-in sequence types in Python
                            
                                Is there support for the IN-operator in the "SQL Expression Language" used in SQLAlchemy?
                            
                                Is there a way to generate pdf containing non-ascii symbols with pisa from django template?
                            
                                Increment Numpy array with repeated indices
                            
                                How to test floats results with doctest?
                            
                                Why does easy_install extract some python eggs and not others?
                            
                                How to create a python 2.x package - simple case
                            
                                How to get Fabric to automatically (instead of user-interactively) interact with shell commands? Combine with pexpect?
                            
                                Dictionary of dictionaries in Python?
                            
                                How do I dynamically create properties in Python?
                            
                                Python/Django development, windows or linux? [closed]
                            
                                How to take draw an average line for a scatter / a plot in MatPlotLib?
                            
                                Given a .torrent file how do I generate a magnet link in python? [closed]
                            
                                Is this Python code vulnerable to SQL injection? (SQLite3)
                            
                                syntaxError: 'continue' not properly in loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Better binning in pandas [duplicate]

Tags:

python

pandas

binning

monkut

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us