Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby and qcut

Is there a way to structure Pandas groupby and qcut commands to return one column that has nested tiles? Specifically, suppose I have 2 groups of data and I want qcut applied to each group and then return the output to one column. This would be similar to MS SQL Server's ntile() command that allows Partition by().

     A    B  C 0  foo  0.1  1 1  foo  0.5  2 2  foo  1.0  3 3  bar  0.1  1 4  bar  0.5  2 5  bar  1.0  3 

In the dataframe above I would like to apply the qcut function to B while partitioning on A to return C.

like image 842
mhabiger Avatar asked Oct 16 '13 12:10

mhabiger


People also ask

What is QCUT in pandas?

The pandas documentation describes qcut as a “Quantile-based discretization function.” This basically means that qcut tries to divide up the underlying data into equal sized bins. The function defines the bins using percentiles based on the distribution of the data, not the actual numeric edges of the bins.

What is the difference between cut and QCUT pandas?

Qcut (quantile-cut) differs from cut in the sense that, in qcut, the number of elements in each bin will be roughly the same, but this will come at the cost of differently sized interval widths.

How do you get Groupby rows in pandas?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.

How do you use Groupby in pandas?

The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .


1 Answers

import pandas as pd df = pd.DataFrame({'A':'foo foo foo bar bar bar'.split(),                    'B':[0.1, 0.5, 1.0]*2})  df['C'] = df.groupby(['A'])['B'].transform(                      lambda x: pd.qcut(x, 3, labels=range(1,4))) print(df) 

yields

     A    B  C 0  foo  0.1  1 1  foo  0.5  2 2  foo  1.0  3 3  bar  0.1  1 4  bar  0.5  2 5  bar  1.0  3 
like image 160
unutbu Avatar answered Sep 21 '22 13:09

unutbu