Python Pandas: remove entries based on the number of occurrences

Tags:

I'm trying to remove entries from a data frame which occur less than 100 times. The data frame data looks like this:

pid   tag 1     23     1     45 1     62 2     24 2     45 3     34 3     25 3     62

Now I count the number of tag occurrences like this:

bytag = data.groupby('tag').aggregate(np.count_nonzero)

But then I can't figure out how to remove those entries which have low count...

801

asked Nov 19 '12 01:11

sashkello

2 Answers

New in 0.12, groupby objects have a filter method, allowing you to do these types of operations:

In [11]: g = data.groupby('tag')  In [12]: g.filter(lambda x: len(x) > 1)  # pandas 0.13.1 Out[12]:    pid  tag 1    1   45 2    1   62 4    2   45 7    3   62

The function (the first argument of filter) is applied to each group (subframe), and the results include elements of the original DataFrame belonging to groups which evaluated to True.

Note: in 0.12 the ordering is different than in the original DataFrame, this was fixed in 0.13+:

In [21]: g.filter(lambda x: len(x) > 1)  # pandas 0.12 Out[21]:     pid  tag 1    1   45 4    2   45 2    1   62 7    3   62

161

answered Sep 17 '22 18:09

Andy Hayden

Edit: Thanks to @WesMcKinney for showing this much more direct way:

data[data.groupby('tag').pid.transform(len) > 1]

import pandas import numpy as np data = pandas.DataFrame(     {'pid' : [1,1,1,2,2,3,3,3],      'tag' : [23,45,62,24,45,34,25,62],      })  bytag = data.groupby('tag').aggregate(np.count_nonzero) tags = bytag[bytag.pid >= 2].index print(data[data['tag'].isin(tags)])

yields

   pid  tag 1    1   45 2    1   62 4    2   45 7    3   62

answered Sep 21 '22 18:09

unutbu

Related questions
                            
                                Jquery input.files equivalent
                            
                                How to add sound to notification?
                            
                                ImportError: No module named django.core.wsgi for uwsgi
                            
                                How to get an element's ID from event.target
                            
                                Play HTML5 Video when scrolled to
                            
                                How to create javascript delay function [duplicate]
                            
                                Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
                            
                                Why can't variables be declared in an if statement?
                            
                                declaring simple variables in views in Laravel
                            
                                Angularjs: call other scope which in iframe
                            
                                How to query from within Mongoose pre hook in a Node.js / Express app?
                            
                                How to specify a read timeout for a Net::HTTP::Post.new request in Ruby 2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With