I have the following pandas DataFrame:
email cat class_price
0 [email protected] cat1 1
1 [email protected] cat2 2
2 [email protected] cat2 4
3 [email protected] cat2 4
4 [email protected] cat2 1
5 [email protected] cat1 3
6 [email protected] cat1 2
7 [email protected] cat2 1
8 [email protected] cat2 4
9 [email protected] cat2 2
10 [email protected] cat3 1
11 [email protected] cat1 1
And I want to group by email and by class_price, for each line I want to take the max of class_price.
I'm using:
test_df2 = test_df.groupby(['email','cat'])['class_price'].max()
The output is:
email cat
[email protected] cat1 2
cat2 4
[email protected] cat2 2
cat3 1
[email protected] cat1 3
cat2 4
But how can I get a result where even grouped columns retain repeated values,such that it can be be written as a proper table with all the values:
email cat maxvalue
[email protected] cat2 2
[email protected] cat1 2
[email protected] cat3 3
Note: example output isn't compatible with example input just written to explain the idea.
Pandas str. repeat() method is used to repeat string values in the same position of passed series itself. An array can also be passed in case to define the number of times each element should be repeated in series.
DataFrame Looping (iteration) with a for statement. You can loop over a pandas dataframe, for each column row by row.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
You can try reset_index
as in other answer or you can try below -
test_df2 = test_df.groupby(['email','cat'], as_index=False)['class_price'].max()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With