I have a problem filtering a pandas
dataframe.
city NYC NYC NYC NYC SYD SYD SEL SEL ... df.city.value_counts()
I would like to remove rows of cities that has less than 4 count frequency, which would be SYD and SEL for instance.
What would be the way to do so without manually dropping them city by city?
Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).
Python pandas drop rows by index To remove the rows by index all we have to do is pass the index number or list of index numbers in case of multiple drops. to drop rows by index simply use this code: df. drop(index) . Here df is the dataframe on which you are working and in place of index type the index number or name.
Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.
Deleting rows using “drop” (best for small numbers of rows) To delete rows from a DataFrame, the drop function references the rows based on their “index values“. Most typically, this is an integer value per row, that increments from zero when you first load data into Pandas. You can see the index when you run “data.
Here you go with filter
df.groupby('city').filter(lambda x : len(x)>3) Out[1743]: city 0 NYC 1 NYC 2 NYC 3 NYC
Solution two transform
sub_df = df[df.groupby('city').city.transform('count')>3].copy() # add copy for future warning when you need to modify the sub df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With