Suppose, I have a dataframe in pandas like below:
campaignname category_type amount
A cat_A_0 2.0
A cat_A_1 1.0
A cat_A_2 3.0
A cat_A_2 3.0
A cat_A_2 4.0
B cat_B_0 3.0
C cat_C_0 1.0
C cat_C_1 2.0
I am using the following code to group the above dataframe (say it's assigned variable name df
) by different columns as follows:
for name, gp in df.groupby('campaignname'):
sorted_gp = gp.groupby(['campaignname', 'category_type']).sum().sort_values(['amount'], ascending=False)
# I'd like to know how to select this in a cleaner/more concise way
first_row = [sorted_gp.iloc[0].name[0], sorted_gp.iloc[0].name[1], sorted_gp.iloc[0].values.tolist()[0]]
The purpose of the above code is to first groupby
the raw data on campaignname
column, then in each of the resulting group, I'd like to group again by both campaignname
and category_type
, and finally, sort by amount
column to choose the first row that comes up (the one with the highest amount
in each group. Specifically for the above example, I'd like to get results like this:
first_row = ['A', 'cat_A_2', 4.0] # for the first group
first_row = ['B', 'cat_B_0', 3.0] # for the second group
first_row = ['C', 'cat_C_1', 2.0] # for the third group
etc.
As you can see, I'm using a rather (in my opinion) 'ugly' way to retrieve the first row of each sorted group, but since I'm new to pandas, I don't know a better/cleaner way to accomplish this. If anyone could let me know a way to select the first row in a sorted group from a pandas dataframe, I'd greatly appreciate it. Thank you in advance for your answers/suggestions!
Select & print first row of dataframe using head() It will return the first row of dataframe as a dataframe object. Using the head() function, we fetched the first row of dataframe as a dataframe and then just printed it.
pandas. Series is easier to get the value. You can get the first row with iloc[0] and the last row with iloc[-1] . If you want to get the value of the element, you can do with iloc[0]['column_name'] , iloc[-1]['column_name'] .
To get the value of the first row of a given column use pandas. DataFrame. iloc[] property .
IIUC you can do it this way:
In [83]: df.groupby('campaignname', as_index=False) \
.apply(lambda x: x.nlargest(1, columns=['amount'])) \
.reset_index(level=1, drop=1)
Out[83]:
campaignname category_type amount
0 A cat_A_2 4.0
1 B cat_B_0 3.0
2 C cat_C_1 2.0
or:
In [76]: df.sort_values('amount', ascending=False).groupby('campaignname').head(1)
Out[76]:
campaignname category_type amount
4 A cat_A_2 4.0
5 B cat_B_0 3.0
7 C cat_C_1 2.0
My preferred way to do this is with idxmax
. It returns the index of the maximum value. I subsequently use that index to slice df
df.loc[df.groupby('campaignname').amount.idxmax()]
campaignname category_type amount
4 A cat_A_2 4.0
5 B cat_B_0 3.0
7 C cat_C_1 2.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With