Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better binning in pandas [duplicate]

I've got a data frame and want to filter or bin by a range of values and then get the counts of values in each bin.

Currently, I'm doing this:

x = 5
y = 17
z = 33
filter_values = [x, y, z]
filtered_a = df[df.filtercol <= x]
a_count = filtered_a.filtercol.count()

filtered_b = df[df.filtercol > x]
filtered_b = filtered_b[filtered_b <= y]
b_count = filtered_b.filtercol.count()

filtered_c = df[df.filtercol > y]
c_count = filtered_c.filtercol.count()

But is there a more concise way to accomplish the same thing?

like image 737
monkut Avatar asked Jan 22 '13 03:01

monkut


People also ask

How do you binning in pandas?

In Python pandas binning by distance is achieved by means of the cut() function. We group values related to the column Cupcake into three groups: small, medium and big. In order to do it, we need to calculate the intervals within each group falls.

Is PyArrow faster than pandas?

To summarize, if your apps save/load data from disk frequently, then it's a wise decision to leave these operations to PyArrow. Heck, it's 7 times faster for the identical file format. Imagine we introduced Parquet file format to the mix.

What does duplicated do in pandas?

Pandas DataFrame duplicated() Method The duplicated() method returns a Series with True and False values that describe which rows in the DataFrame are duplicated and not. Use the subset parameter to specify if any columns should not be considered when looking for duplicates.


1 Answers

Perhaps you are looking for pandas.cut:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(50), columns=['filtercol'])
filter_values = [0, 5, 17, 33]   
out = pd.cut(df.filtercol, bins=filter_values)
counts = pd.value_counts(out)
# counts is a Series
print(counts)

yields

(17, 33]    16
(5, 17]     12
(0, 5]       5

To reorder the result so the bin ranges appear in order, you could use

counts.sort_index()

which yields

(0, 5]       5
(5, 17]     12
(17, 33]    16

Thanks to nivniv and InLaw for this improvement.


See also Discretization and quantiling.

like image 183
unutbu Avatar answered Oct 16 '22 17:10

unutbu