Julia DataFrame multiple values filtering

Question

There are two ways to filter DataFrame in case below:

1. df = df[((df[:field].==1) | (df[:field].==2)), :]
2. df = df[[in(v, [1, 2]) for v in df[:field]], :]

Second approach is slower but it's suitable for mutable set of values in condition. Is there any syntactic sugar I missed so I can get it as fast as 1st way but with some in-like construction?

Reza Afzalan · Accepted Answer

julia> using DataFrames

findin function could be another way to do the task:

julia> function t_findin(df::DataFrames.DataFrame)
        df[findin(df[:A],[1,2]), :]
       end
t3 (generic function with 1 method)

array comprehensions:

julia> function t_compr(df::DataFrames.DataFrame)
        df[[in(v, [1, 2]) for v in df[:A]], :]
       end
t1 (generic function with 1 method)

multiple conditionds:

julia> function t_mconds(df::DataFrames.DataFrame)
        df[((df[:A].==1) | (df[:A].==2)), :]
       end
t2 (generic function with 1 method)

Test data

julia> df[:B] = rand(1:30,10_000_000);
julia> df[:A] = rand(1:30,10_000_000);

Test results

julia> @time t_findin(df);
  0.489064 seconds (67 allocations: 19.340 MB, 0.49% gc time)

julia> @time t_mconds(df);
  0.222389 seconds (106 allocations: 78.933 MB, 5.98% gc time)

julia> @time t_compr(df);
 23.634846 seconds (100.00 M allocations: 2.563 GB, 1.47% gc time)

Julia DataFrame multiple values filtering

Tags:

syntactic-sugar

julia

iw2rmb

1 Answers

Reza Afzalan

Recent Activity

Donate For Us

Julia DataFrame multiple values filtering

Tags:

syntactic-sugar

julia

iw2rmb

1 Answers

Reza Afzalan

Related questions

Recent Activity

Donate For Us