I want to extract the 3rd and 7th row of a data frame in Julia. The MWE is:
using DataFrames
my_data = DataFrame(A = 1:10, B = 16:25);
my_data
10×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 16 │
│ 2 │ 2 │ 17 │
│ 3 │ 3 │ 18 │
│ 4 │ 4 │ 19 │
│ 5 │ 5 │ 20 │
│ 6 │ 6 │ 21 │
│ 7 │ 7 │ 22 │
│ 8 │ 8 │ 23 │
│ 9 │ 9 │ 24 │
│ 10 │ 10 │ 25 │
This should give you the expected output:
using DataFrames
my_data = DataFrame(A = 1:10, B = 16:25);
my_data;
my_data[[3, 7], :]
2×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 3 │ 18 │
│ 2 │ 7 │ 22 │
The great thing about Julia is that you do not need to materialize the result (and hence save memory and time on copying the data). Hence, if you need a subrange of any array-like structure it is better to use @view
rather than materialize directly
julia> @view my_data[[3, 7], :]
2×2 SubDataFrame
│ Row │ A │ B │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 3 │ 18 │
│ 2 │ 7 │ 22 │
Now the performance testing.
function submean1(df)
d = df[[3, 7], :]
mean(d.A)
end
function submean2(df)
d = @view df[[3, 7], :]
mean(d.A)
end
And tests:
julia> using BenchmarkTools
julia> @btime submean1($my_data)
689.262 ns (19 allocations: 1.38 KiB)
5.0
julia> @btime submean2($my_data)
582.315 ns (9 allocations: 288 bytes)
5.0
Even in this simplistic example @view
is 15% faster and uses four times less memory. Of course sometimes you want to copy the data but the rule of thumb is not to materialize.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With