I have the following pandas DataFrame: <pre class="prettyprint"><code> email cat class_price 0 email1@gmail.com cat1 1 1 email2@gmail.com cat2 2 2 email3@gmail.com cat2 4 3 email1@gmail.com cat2 4 4 email2@gmail.com cat2 1 5 email3@gmail.com cat1 3 6 email1@gmail.com cat1 2 7 email2@gmail.com cat2 1 8 email3@gmail.com cat2 4 9 email1@gmail.com cat2 2 10 email2@gmail.com cat3 1 11 email3@gmail.com cat1 1 </code></pre> And I want to group by email and by class_price, for each line I want to take the max of class_price. I'm using: <pre class="prettyprint"><code>test_df2 = test_df.groupby(['email','cat'])['class_price'].max() </code></pre> The output is: <pre class="prettyprint"><code>email cat email1@gmail.com cat1 2 cat2 4 email2@gmail.com cat2 2 cat3 1 email3@gmail.com cat1 3 cat2 4 </code></pre> But how can I get a result where even grouped columns retain repeated values,such that it can be be written as a proper table with all the values: <pre class="prettyprint"><code>email cat maxvalue email1@gmail.com cat2 2 email1@gmail.com cat1 2 email3@gmail.com cat3 3 </code></pre> Note: example output isn't compatible with example input just written to explain the idea.

You can try <code>reset_index</code> as in other answer or you can try below - <pre class="prettyprint"><code> test_df2 = test_df.groupby(['email','cat'], as_index=False)['class_price'].max() </code></pre>

Repeating values in a "group by" pandas dataframe

Tags:

python

pandas

dataframe

I have the following pandas DataFrame:

     email   cat  class_price
0   [email protected]  cat1            1
1   [email protected]  cat2            2
2   [email protected]  cat2            4
3   [email protected]  cat2            4
4   [email protected]  cat2            1
5   [email protected]  cat1            3
6   [email protected]  cat1            2
7   [email protected]  cat2            1
8   [email protected]  cat2            4
9   [email protected]  cat2            2
10  [email protected]  cat3            1
11  [email protected]  cat1            1

And I want to group by email and by class_price, for each line I want to take the max of class_price.

I'm using:

test_df2 = test_df.groupby(['email','cat'])['class_price'].max()

The output is:

email             cat 
[email protected]  cat1    2
                  cat2    4
[email protected]  cat2    2
                  cat3    1
[email protected]  cat1    3
                  cat2    4

But how can I get a result where even grouped columns retain repeated values,such that it can be be written as a proper table with all the values:

email             cat      maxvalue 
[email protected]    cat2     2
[email protected]    cat1     2
[email protected]    cat3     3

Note: example output isn't compatible with example input just written to explain the idea.

564

asked Apr 17 '16 12:04

stackit

1 Answers

You can try reset_index as in other answer or you can try below -


test_df2 = test_df.groupby(['email','cat'], as_index=False)['class_price'].max()

145

answered Sep 22 '22 12:09

Blue Bird

Related questions
                            
                                Bokeh's equivalent to matplotlib subplots
                            
                                OpenID Connect Provider in Python
                            
                                SqlAlchemy Python multiple databases
                            
                                How to list all openssl ciphers available in statically linked python releases?
                            
                                `matplotlib`: what is the purpose of an artist's animated state?
                            
                                Bulk inserts with Flask-SQLAlchemy
                            
                                numpy: difference between NaN and masked array
                            
                                Plot multiple DataFrame columns in Seaborn FacetGrid
                            
                                Is "__module__" guaranteed to be defined during class creation?
                            
                                Is os.listdir() deterministic?
                            
                                Why Should Homebrew be used to Install Python?
                            
                                Django Abstract Models setting related_name with underscores
                            
                                Best way to override lineno in Python logger
                            
                                Maximum recursion depth error in Python when calling super's init. [duplicate]
                            
                                How do I extend UserCreationForm to include email field
                            
                                AttributeError: lower not found; using a Pipeline with a CountVectorizer in scikit-learn
                            
                                Pandas escape carriage return in to_csv
                            
                                Image recognition using TensorFlow [closed]
                            
                                Multiply scipy.lti transfer functions
                            
                                Fix Conflicting migrations detected in Django1.9

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With