I have a DataFrame with three columns <code>Date</code>, <code>Advertiser</code> and ID. I grouped the data firsts to see if volumns of some Advertisers are too small (For example when <code>count()</code> less than 500). And then I want to drop those rows in the group table. <pre class="prettyprint"><code>df.groupby(['Date','Advertiser']).ID.count() </code></pre> The result likes this: <pre class="prettyprint"><code> Date Advertiser 2016-01 A 50000 B 50 C 4000 D 24000 2016-02 A 6800 B 7800 C 123 2016-03 B 1111 E 8600 F 500 </code></pre> I want a result to be this: <pre class="prettyprint"><code> Date Advertiser 2016-01 A 50000 C 4000 D 24000 2016-02 A 6800 B 7800 2016-03 B 1111 E 8600 </code></pre> Followed up question: How about if I want to filter out the rows in groupby in term of the total <code>count()</code> in date category. For example, I want to <code>count()</code> for a date larger than 15000. The table I want likes this: <pre class="prettyprint"><code>Date Advertiser 2016-01 A 50000 B 50 C 4000 D 24000 2016-02 A 6800 B 7800 C 123 </code></pre>

You have a Series object after the <code>groupby</code>, which can be filtered based on value with a chained lambda filter: <pre class="prettyprint"><code>df.groupby(['Date','Advertiser']).ID.count()[lambda x: x >= 500] #Date Advertiser #2016-01 A 50000 # C 4000 # D 24000 #2016-02 A 6800 # B 7800 #2016-03 B 1111 # E 8600 # F 500 </code></pre>

Pandas DataFrame to drop rows in the groupby

Tags:

python

pandas

dataframe

I have a DataFrame with three columns Date, Advertiser and ID. I grouped the data firsts to see if volumns of some Advertisers are too small (For example when count() less than 500). And then I want to drop those rows in the group table.

df.groupby(['Date','Advertiser']).ID.count()

The result likes this:

 Date         Advertiser
 2016-01        A             50000
                B               50
                C              4000
                D             24000
 2016-02        A              6800
                B              7800
                C               123
 2016-03        B              1111
                E              8600
                F               500

I want a result to be this:

 Date         Advertiser
 2016-01        A             50000
                C              4000
                D             24000
 2016-02        A              6800
                B              7800
 2016-03        B              1111
                E              8600

Followed up question:

How about if I want to filter out the rows in groupby in term of the total count() in date category. For example, I want to count() for a date larger than 15000. The table I want likes this:

Date         Advertiser
 2016-01        A             50000
                B               50
                C              4000
                D             24000
 2016-02        A              6800
                B              7800
                C               123

257

asked Mar 23 '17 03:03

Zed Fang

1 Answers

You have a Series object after the groupby, which can be filtered based on value with a chained lambda filter:

df.groupby(['Date','Advertiser']).ID.count()[lambda x: x >= 500]

#Date     Advertiser
#2016-01  A             50000
#         C              4000
#         D             24000
#2016-02  A              6800
#         B              7800
#2016-03  B              1111
#         E              8600
#         F               500

198

answered Sep 21 '22 20:09

Psidom

Related questions
                            
                                Convert the strictly upper triangular part of a matrix into an array in Tensorflow
                            
                                How can I exclude a view from the API documentation?
                            
                                "Empty reply from server" for Flask + uWSGI setup
                            
                                Can't use PySide with Anaconda
                            
                                How to apply a function to a dask dataframe and return multiple values?
                            
                                Generate function with arguments filled in when creating it?
                            
                                How to list all files and folders in my dropbox using v2 api
                            
                                Saving a local datetime offset the time by 4 minutes
                            
                                Pandas - Converting Derived Datetime to Integer
                            
                                How to read value of fetched cell data as date google sheets API
                            
                                Subtracting group specific values from each group
                            
                                How to install pip for a specific python version
                            
                                When does Tensorflow update weights and biases?
                            
                                Why is my pandas dataframe turning into 'None' type?
                            
                                Kivy - change FileChooser defaul location
                            
                                Unable to read MAT file with scipy
                            
                                Import .dat file in Python 3
                            
                                Is there any way to make a "for" loop in python double my index value after each iteration?
                            
                                How to parse xml from local file or url with lxml?
                            
                                Communication between Python and C#

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With