I am wondering how to use the "by" function to group by and sum multiple columns. If I wanted to groupby one column I can do it by this
someData = DataFrame(:Countries => ["Afganistan","Albainia","Albainia","Andorra","Angola","Angola"],:population => rand(100:1000,6), :GDP => rand(1:100,6))
by(someData, :Countries, df ->DataFrame(pop_sum = sum(df[:population])))
However, I wanted to get the sum of both population and GDP. I tried something like below which is of course incorrect. Any ideas?
by(someData, :Countries, df ->DataFrame(pop_sum, GDP_sum = sum(df[[:population,:GDP]])))
Do not use by function as it is deprecated. Instead use this (you do not see the warning as probably you are starting Julia with --depwarn set to no which is the default):
julia> someData = DataFrame(:Countries => ["Afganistan","Albainia","Albainia","Andorra","Angola","Angola"],
:population => rand(100:1000,6),
:GDP => rand(1:100,6))
6×3 DataFrame
│ Row │ Countries │ population │ GDP │
│ │ String │ Int64 │ Int64 │
├─────┼────────────┼────────────┼───────┤
│ 1 │ Afganistan │ 543 │ 29 │
│ 2 │ Albainia │ 853 │ 71 │
│ 3 │ Albainia │ 438 │ 81 │
│ 4 │ Andorra │ 860 │ 88 │
│ 5 │ Angola │ 940 │ 64 │
│ 6 │ Angola │ 688 │ 40 │
julia> combine(groupby(someData, :Countries), [:population, :GDP] .=> sum)
4×3 DataFrame
│ Row │ Countries │ population_sum │ GDP_sum │
│ │ String │ Int64 │ Int64 │
├─────┼────────────┼────────────────┼─────────┤
│ 1 │ Afganistan │ 543 │ 29 │
│ 2 │ Albainia │ 1291 │ 152 │
│ 3 │ Andorra │ 860 │ 88 │
│ 4 │ Angola │ 1628 │ 104 │
The alternative way to write it would be:
julia> combine(groupby(someData, :Countries)) do sdf
return (population_sum = sum(sdf.population), GDP_sum=sum(sdf.GDP))
end
4×3 DataFrame
│ Row │ Countries │ population_sum │ GDP_sum │
│ │ String │ Int64 │ Int64 │
├─────┼────────────┼────────────────┼─────────┤
│ 1 │ Afganistan │ 543 │ 29 │
│ 2 │ Albainia │ 1291 │ 152 │
│ 3 │ Andorra │ 860 │ 88 │
│ 4 │ Angola │ 1628 │ 104 │
but it is more verbose in this case (it would be useful if you wanted to do more complex preprocessing of the data before returning the value).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With