Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract particular rows from a data frame in Julia?

Tags:

julia

I want to extract the 3rd and 7th row of a data frame in Julia. The MWE is:

using DataFrames
my_data = DataFrame(A = 1:10, B = 16:25);
my_data

10×2 DataFrame
│ Row │ A     │ B     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 16    │
│ 2   │ 2     │ 17    │
│ 3   │ 3     │ 18    │
│ 4   │ 4     │ 19    │
│ 5   │ 5     │ 20    │
│ 6   │ 6     │ 21    │
│ 7   │ 7     │ 22    │
│ 8   │ 8     │ 23    │
│ 9   │ 9     │ 24    │
│ 10  │ 10    │ 25    │
like image 570
Qwerty Avatar asked Sep 01 '25 01:09

Qwerty


2 Answers

This should give you the expected output:

using DataFrames
my_data = DataFrame(A = 1:10, B = 16:25);
my_data;
my_data[[3, 7], :]

2×2 DataFrame
│ Row │ A     │ B     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 3     │ 18    │
│ 2   │ 7     │ 22    │
like image 148
Qwerty Avatar answered Sep 03 '25 19:09

Qwerty


The great thing about Julia is that you do not need to materialize the result (and hence save memory and time on copying the data). Hence, if you need a subrange of any array-like structure it is better to use @view rather than materialize directly

julia> @view my_data[[3, 7], :]
2×2 SubDataFrame
│ Row │ A     │ B     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 3     │ 18    │
│ 2   │ 7     │ 22    │

Now the performance testing.

function submean1(df)
    d = df[[3, 7], :]
    mean(d.A)
end

function submean2(df)
    d = @view df[[3, 7], :]
    mean(d.A)
end

And tests:

julia> using BenchmarkTools

julia> @btime submean1($my_data)
  689.262 ns (19 allocations: 1.38 KiB)
5.0

julia> @btime submean2($my_data)
  582.315 ns (9 allocations: 288 bytes)
5.0

Even in this simplistic example @view is 15% faster and uses four times less memory. Of course sometimes you want to copy the data but the rule of thumb is not to materialize.

like image 27
Przemyslaw Szufel Avatar answered Sep 03 '25 19:09

Przemyslaw Szufel