I am trying to make a groupby + sum on a Julia Dataframe with Int and String values
For instance, df :
│ Row │ A │ B │ C │ D │
│ │ String │ String │ Int64 │ String │
├─────┼────────┼────────┼───────┼────────┤
│ 1 │ x1 │ a │ 12 │ green │
│ 2 │ x2 │ a │ 7 │ blue │
│ 3 │ x1 │ b │ 5 │ red │
│ 4 │ x2 │ a │ 4 │ blue │
│ 5 │ x1 │ b │ 9 │ yellow │
To do this in Python, the command could be :
df_group = df.groupby(['A', 'B']).sum().reset_index()
I will obtain the following output result with the initial column labels :
A B C
0 x1 a 12
1 x1 b 14
2 x2 a 11
I would like to do the same thing in Julia. I tried this way, unsuccessfully :
df_group = aggregate(df, ["A", "B"], sum)
MethodError: no method matching +(::String, ::String)
Have you any idea of a way to do this in Julia ?
Try (actually instead of non-string columns, probably you want columns that are numeric):
numcols = names(df, findall(x -> eltype(x) <: Number, eachcol(df)))
combine(groupby(df, ["A", "B"]), numcols .=> sum .=> numcols)
and if you want to allow missing
values (and skip them when doing a summation) then:
numcols = names(df, findall(x -> eltype(x) <: Union{Missing,Number}, eachcol(df)))
combine(groupby(df, ["A", "B"]), numcols .=> sum∘skipmissing .=> numcols)
Julia DataFrames support split-apply-combine logic, similar to pandas, so aggregation looks like
using DataFrames
df = DataFrame(:A => ["x1", "x2", "x1", "x2", "x1"],
:B => ["a", "a", "b", "a", "b"],
:C => [12, 7, 5, 4, 9],
:D => ["green", "blue", "red", "blue", "yellow"])
gdf = groupby(df, [:A, :B])
combine(gdf, :C => sum)
with the result
julia> combine(gdf, :C => sum)
3×3 DataFrame
│ Row │ A │ B │ C_sum │
│ │ String │ String │ Int64 │
├─────┼────────┼────────┼───────┤
│ 1 │ x1 │ a │ 12 │
│ 2 │ x2 │ a │ 11 │
│ 3 │ x1 │ b │ 14 │
You can skip the creation of gdf
with the help of Pipe.jl or Underscores.jl
using Underscores
@_ groupby(df, [:A, :B]) |> combine(__, :C => sum)
You can give name to the new column with the following syntax
julia> @_ groupby(df, [:A, :B]) |> combine(__, :C => sum => :C)
3×3 DataFrame
│ Row │ A │ B │ C │
│ │ String │ String │ Int64 │
├─────┼────────┼────────┼───────┤
│ 1 │ x1 │ a │ 12 │
│ 2 │ x2 │ a │ 11 │
│ 3 │ x1 │ b │ 14 │
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With