I have a Julia DataFrame
using DataFrames
df = DataFrame(a = [1,1,1,2,2,2,2], b = 1:7)
7×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 1
2 │ 1 2
3 │ 1 3
4 │ 2 4
5 │ 2 5
6 │ 2 6
7 │ 2 7
and want to create a new column that contains the row number per group. It should look like this
7×2 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼──────────────────────
1 │ 1 1 1
2 │ 1 2 2
3 │ 1 3 3
4 │ 2 4 4
5 │ 2 5 1
6 │ 2 6 2
7 │ 2 7 3
I am open to any solution, but I am especially looking for a DataFramesMeta solution that works out nicely together with the Chain package. R's dplyr has a simple function named n() that is doing this. I feel like there must be something similar in Julia
Do:
julia> using DataFrames, DataFramesMeta
julia> df = DataFrame(a = [1,1,1,2,2,2,2], b = 1:7)
7×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 1
2 │ 1 2
3 │ 1 3
4 │ 2 4
5 │ 2 5
6 │ 2 6
7 │ 2 7
julia> @chain df begin
groupby(:a)
@transform(:c = eachindex(:b))
end
7×3 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 1 1
2 │ 1 2 2
3 │ 1 3 3
4 │ 2 4 1
5 │ 2 5 2
6 │ 2 6 3
7 │ 2 7 4
In upcoming DataFrames.jl 1.4 release it will be even simpler, see https://github.com/JuliaData/DataFrames.jl/pull/3001.
(the difference is that you will not have to pass the column name as :b in this case but write :c = $eachindex)
In DataFrames.jl 1.4 you can just write:
julia> transform(groupby(df, :a), eachindex => :c)
7×3 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 1 1
2 │ 1 2 2
3 │ 1 3 3
4 │ 2 4 1
5 │ 2 5 2
6 │ 2 6 3
7 │ 2 7 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With