This must be a simple question, but, however, it is bugging my head for a while. For a dataframe below: <pre class="prettyprint"><code>df = pd.DataFrame({'c0': ['a','b','a'],'c1': ['a','bb','a'],'c2':[10,20,30]}) c0 c1 c2 0 a a 10 1 b bb 20 2 a a 30 </code></pre> How to get output where count > 1? I have tried: <pre class="prettyprint"><code>df.groupby(['c0','c1'])['c2'].count() c0 c1 a a 2 b bb 1 </code></pre> Required is: <pre class="prettyprint"><code>c0 c1 a a 2 </code></pre> I am looking other than <pre class="prettyprint"><code>x = df.groupby(['c0','c1'])['c2'].count() x[x>1] </code></pre> i.e. a one-liner answer.

Use <code>GroupBy.transform</code> for Series with same size like original DataFrame: <pre class="prettyprint"><code>df1 = df[df.groupby(['c0','c1'])['c2'].transform('count') > 1] </code></pre> Or use <code>DataFrame.duplicated</code> for filtered all dupe rows by specified columns in list: <pre class="prettyprint"><code>df1 = df[df.duplicated(['c0','c1'], keep=False)] </code></pre> If performance is in not important or small DataFrame use <code>DataFrameGroupBy.filter</code>: <pre class="prettyprint"><code>df1 = df.groupby(['c0','c1']).filter(lambda x: len(x) > 1) </code></pre>

Pandas groupby take counts greater than 1

Tags:

python

pandas

This must be a simple question, but, however, it is bugging my head for a while.

For a dataframe below:

df = pd.DataFrame({'c0': ['a','b','a'],'c1': ['a','bb','a'],'c2':[10,20,30]})
  c0  c1  c2
0  a   a  10
1  b  bb  20
2  a   a  30

How to get output where count > 1?

I have tried:

df.groupby(['c0','c1'])['c2'].count()
c0  c1
a   a     2
b   bb    1

Required is:

c0  c1
a   a     2

I am looking other than

x = df.groupby(['c0','c1'])['c2'].count()
x[x>1]

i.e. a one-liner answer.

577

asked Mar 26 '19 15:03

BhishanPoudel

1 Answers

Use GroupBy.transform for Series with same size like original DataFrame:

df1 = df[df.groupby(['c0','c1'])['c2'].transform('count') > 1]

Or use DataFrame.duplicated for filtered all dupe rows by specified columns in list:

df1 = df[df.duplicated(['c0','c1'], keep=False)]

If performance is in not important or small DataFrame use DataFrameGroupBy.filter:

df1 = df.groupby(['c0','c1']).filter(lambda x: len(x) > 1)

answered Oct 18 '22 04:10

jezrael

Related questions
                            
                                Avoid overflow with softplus function in python
                            
                                Django registration of tag library not working
                            
                                Python Pandas - select dataframe columns where equals
                            
                                Python NamedTemporaryFile appears empty even after data is written
                            
                                How to filter by multiple criteria in Flask SQLAlchemy?
                            
                                Multiples-keys dictionary where key order doesn't matter
                            
                                Using __add__ operator with multiple arguments in Python
                            
                                Seaborn pairplot ValueError: max must be larger than min in range parameter
                            
                                Sum columns by level in a pandas MultiIndex DataFrame
                            
                                Unittest on AWS Lambda
                            
                                Python flask-cors ImportError: No module named 'flask-cors' Raspberry pi
                            
                                Py4J error when creating a spark dataframe using pyspark
                            
                                Initialise a NumPy array based on its index
                            
                                Pythonic way to create a dictionary from a list where the keys are the elements that are found in another list and values are elements between keys
                            
                                Predict classes or class probabilities?
                            
                                Can I avoid a sorted dictionary output after I've used pprint.pprint, in Python?
                            
                                Find group of consecutive dates in Pandas DataFrame
                            
                                How to start from second index for for-loop
                            
                                Mypy error on dict of dict: Value of type "object" is not indexable
                            
                                Finding non-intersection of two pytorch tensors

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With