I'm working in Python with a pandas DataFrame of video games, each with a genre. I'm trying to remove any video game with a genre that appears less than some number of times in the DataFrame, but I have no clue how to go about this. I did find a StackOverflow question that seems to be related, but I can't decipher the solution at all (possibly because I've never heard of R and my memory of functional programming is rusty at best).
Help?
Using query() to Filter by Column Value in pandas DataFrame. query() function is used to filter rows based on column value in pandas. After applying the expression, it returns a new DataFrame. If you wanted to update the existing DataFrame use inplace=True param.
Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.
syntax to use value_counts on a Pandas dataframe This is really simple. You just type the name of the dataframe then . value_counts() . When you use value_counts on a dataframe, it will count the number of records for every combination of unique values for every column.
Use groupby filter:
In [11]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B']) In [12]: df Out[12]: A B 0 1 2 1 1 4 2 5 6 In [13]: df.groupby("A").filter(lambda x: len(x) > 1) Out[13]: A B 0 1 2 1 1 4
I recommend reading the split-combine-section of the docs.
Solutions with better performance should be GroupBy.transform
with size
for count per groups to Series with same size like original df
, so possible filter by boolean indexing
:
df1 = df[df.groupby("A")['A'].transform('size') > 1]
Or use Series.map
with Series.value_counts
:
df1 = df[df['A'].map(df['A'].value_counts()) > 1]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With