Suppose I have the following dataframe
using DataFrames
df = DataFrame(A = 1:10, B = ["a","a","b","b","b","c","c","c","c","d"])
grouped_df = groupby(df, "B")
I would have four groups. How can I drop the groups that have fewer than, say, 2 rows? For example, how can I keep only groups a
,b
, and c
? I can easily do it with a for loop, but I don't think the optimal way.
DataFrame. drop() method you can remove/delete/drop the list of rows from pandas, all you need to provide is a list of rows indexes or labels as a param to this method. By default drop() method removes the rows and returns a copy of the updated DataFrame instead of replacing the existing referring DataFrame.
The Hello, World! of pandas GroupBy You call .groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation.
You can delete one or multiple columns of a DataFrame. To delete or remove only one column from Pandas DataFrame, you can use either del keyword, pop() function or drop() function on the dataframe. To delete multiple columns from Pandas Dataframe, use drop() function on the dataframe.
If you want the result to be still grouped then filter
is simplest:
julia> filter(x -> nrow(x) > 1, grouped_df)
GroupedDataFrame with 3 groups based on key: B
First Group (2 rows): B = "a"
Row │ A B
│ Int64 String
─────┼───────────────
1 │ 1 a
2 │ 2 a
⋮
Last Group (4 rows): B = "c"
Row │ A B
│ Int64 String
─────┼───────────────
1 │ 6 c
2 │ 7 c
3 │ 8 c
4 │ 9 c
If you want to get a data frame as a result of one operation then do e.g.:
julia> combine(grouped_df, x -> nrow(x) < 2 ? DataFrame() : x)
9×2 DataFrame
Row │ B A
│ String Int64
─────┼───────────────
1 │ a 1
2 │ a 2
3 │ b 3
4 │ b 4
5 │ b 5
6 │ c 6
7 │ c 7
8 │ c 8
9 │ c 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With