Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove groups by condition

Tags:

Suppose I have the following dataframe

using DataFrames
df = DataFrame(A = 1:10, B = ["a","a","b","b","b","c","c","c","c","d"])
grouped_df  = groupby(df, "B")

I would have four groups. How can I drop the groups that have fewer than, say, 2 rows? For example, how can I keep only groups a,b, and c? I can easily do it with a for loop, but I don't think the optimal way.

like image 677
user1691278 Avatar asked Mar 04 '21 23:03

user1691278


People also ask

How do I remove a row from a list?

DataFrame. drop() method you can remove/delete/drop the list of rows from pandas, all you need to provide is a list of rows indexes or labels as a param to this method. By default drop() method removes the rows and returns a copy of the updated DataFrame instead of replacing the existing referring DataFrame.

How to group by column pandas?

The Hello, World! of pandas GroupBy You call .groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation.

How do I remove a column from a list in pandas?

You can delete one or multiple columns of a DataFrame. To delete or remove only one column from Pandas DataFrame, you can use either del keyword, pop() function or drop() function on the dataframe. To delete multiple columns from Pandas Dataframe, use drop() function on the dataframe.


1 Answers

If you want the result to be still grouped then filter is simplest:

julia> filter(x -> nrow(x) > 1, grouped_df)
GroupedDataFrame with 3 groups based on key: B
First Group (2 rows): B = "a"
 Row │ A      B
     │ Int64  String
─────┼───────────────
   1 │     1  a
   2 │     2  a
⋮
Last Group (4 rows): B = "c"
 Row │ A      B
     │ Int64  String
─────┼───────────────
   1 │     6  c
   2 │     7  c
   3 │     8  c
   4 │     9  c

If you want to get a data frame as a result of one operation then do e.g.:

julia> combine(grouped_df, x -> nrow(x) < 2 ? DataFrame() : x)
9×2 DataFrame
 Row │ B       A
     │ String  Int64
─────┼───────────────
   1 │ a           1
   2 │ a           2
   3 │ b           3
   4 │ b           4
   5 │ b           5
   6 │ c           6
   7 │ c           7
   8 │ c           8
   9 │ c           9
like image 77
Bogumił Kamiński Avatar answered Oct 05 '22 18:10

Bogumił Kamiński