Is there a way to structure Pandas groupby and qcut commands to return one column that has nested tiles? Specifically, suppose I have 2 groups of data and I want qcut applied to each group and then return the output to one column. This would be similar to MS SQL Server's ntile() command that allows Partition by().
A B C 0 foo 0.1 1 1 foo 0.5 2 2 foo 1.0 3 3 bar 0.1 1 4 bar 0.5 2 5 bar 1.0 3
In the dataframe above I would like to apply the qcut function to B while partitioning on A to return C.
The pandas documentation describes qcut as a “Quantile-based discretization function.” This basically means that qcut tries to divide up the underlying data into equal sized bins. The function defines the bins using percentiles based on the distribution of the data, not the actual numeric edges of the bins.
Qcut (quantile-cut) differs from cut in the sense that, in qcut, the number of elements in each bin will be roughly the same, but this will come at the cost of differently sized interval widths.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .
import pandas as pd df = pd.DataFrame({'A':'foo foo foo bar bar bar'.split(), 'B':[0.1, 0.5, 1.0]*2}) df['C'] = df.groupby(['A'])['B'].transform( lambda x: pd.qcut(x, 3, labels=range(1,4))) print(df)
yields
A B C 0 foo 0.1 1 1 foo 0.5 2 2 foo 1.0 3 3 bar 0.1 1 4 bar 0.5 2 5 bar 1.0 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With