Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I obtain the complement of list of indexes in Julia?

Suppose that I have a variable i = [1,3,5], which was obtained when I applied a filter to an array. Now, suppose that this array has 10 elements, and I'd like to obtain the "complement" indexes. I mean, I want to obtain ic = [2,4,6,7,8,9,10].

Is there a clean and short way of obtaining this complement list of indexes?

I mean, I can do this with a regular loop, but is there a way of doing this with list comprehension?

like image 508
Davi Barreira Avatar asked Nov 22 '20 17:11

Davi Barreira


People also ask

How do you find the index of an element in a list Julia?

Accessing element at a specific index in Julia – getindex() Method. The getindex() is an inbuilt function in julia which is used to construct array of the specified type. This function is also used to get the element of the array at a specific index.

Is Julia 1 or 0 indexed?

Conventionally, Julia's arrays are indexed starting at 1, whereas some other languages start numbering at 0, and yet others (e.g., Fortran) allow you to specify arbitrary starting indices.


5 Answers

You can use setdiff function

julia> x = [1, 3, 5];
julia> y = collect(1:10);
julia> setdiff(y, x)
7-element Vector{Int64}:
  2
  4
  6
  7
  8
  9
 10

Performance-wise, loop-based implementation is better, because we can take into account that new indices are a subset of the original

function mysetdiff(y, x)
    res = Vector{eltype(y)}(undef, length(y) - length(x))
    i = 1
    @inbounds for el in y
        el ∈ x && continue
        res[i] = el
        i += 1
    end

    res
end

and comparison

using BenchmarkTools

@btime [z for z ∈ $y if z ∉ $x]
# 141.893 ns (4 allocations: 288 bytes)
@btime setdiff($y, $x)
# 477.056 ns (8 allocations: 688 bytes)
@btime mysetdiff($y, $x)
# 46.434 ns (1 allocation: 144 bytes)
like image 65
Andrej Oskin Avatar answered Oct 17 '22 19:10

Andrej Oskin


You need to get all elements in 1:10 that aren't in i. So using list comprehension:

julia> i = [1,3,5];

julia> ic = [x for x ∈ 1:10 if x ∉ i]
7-element Array{Int64,1}:
  2
  4
  6
  7
  8
  9
 10

You can type ∈ in REPL with \in Tab. ∉ can be achieved with \notin Tab. If you're coding in an environment that don't allow this, you can just copy from here or type one of these instead:

julia> ic = [x for x in 1:10 if !(x in i)]
julia> ic = [x for x in 1:10 if !in(x, i)] # operators are functions

Performance

If you care about performance, here the benchmark:

julia> @benchmark ic = [x for x ∈ 1:10 if x ∉ i]
BenchmarkTools.Trial: 
  memory estimate:  288 bytes
  allocs estimate:  4
  --------------
  minimum time:     462.274 ns (0.00% GC)
  median time:      471.168 ns (0.00% GC)
  mean time:        497.947 ns (2.86% GC)
  maximum time:     13.115 μs (95.25% GC)
  --------------
  samples:          10000
  evals/sample:     197
like image 40
Héliton Martins Avatar answered Oct 17 '22 21:10

Héliton Martins


If you care about performance you can also consider:

julia> @benchmark deleteat!([1:10;], $i) # indices must be unique and sorted
BenchmarkTools.Trial:
  memory estimate:  160 bytes
  allocs estimate:  1
  --------------
  minimum time:     53.798 ns (0.00% GC)
  median time:      60.790 ns (0.00% GC)
  mean time:        71.125 ns (1.76% GC)
  maximum time:     618.946 ns (77.28% GC)
  --------------
  samples:          10000
  evals/sample:     987

and

julia> @benchmark (x = trues(10); x[$i] .= false; findall(x)) # if Bool-array is enough for you you can skip the last step to save 50% of time
BenchmarkTools.Trial:
  memory estimate:  272 bytes
  allocs estimate:  3
  --------------
  minimum time:     124.863 ns (0.00% GC)
  median time:      134.033 ns (0.00% GC)
  mean time:        157.668 ns (6.84% GC)
  maximum time:     3.000 μs (95.44% GC)
  --------------
  samples:          10000
  evals/sample:     905

vs earlier proposed:

julia> @benchmark ic = [x for x ∈ 1:10 if x ∉ $i]
BenchmarkTools.Trial:
  memory estimate:  288 bytes
  allocs estimate:  4
  --------------
  minimum time:     170.714 ns (0.00% GC)
  median time:      196.571 ns (0.00% GC)
  mean time:        222.510 ns (4.43% GC)
  maximum time:     3.078 μs (91.01% GC)
  --------------
  samples:          10000
  evals/sample:     700

and

julia> @benchmark setdiff($[1:10;], $i)
BenchmarkTools.Trial:
  memory estimate:  672 bytes
  allocs estimate:  7
  --------------
  minimum time:     504.145 ns (0.00% GC)
  median time:      514.508 ns (0.00% GC)
  mean time:        589.584 ns (2.75% GC)
  maximum time:     8.954 μs (90.69% GC)
  --------------
  samples:          10000
  evals/sample:     193

and (custom implementation from the other post)

julia> @benchmark mysetdiff($[1:10;], $i)
BenchmarkTools.Trial:
  memory estimate:  144 bytes
  allocs estimate:  1
  --------------
  minimum time:     44.748 ns (0.00% GC)
  median time:      46.869 ns (0.00% GC)
  mean time:        52.780 ns (1.75% GC)
  maximum time:     431.919 ns (88.49% GC)
  --------------
  samples:          10000
  evals/sample:     990
like image 36
Bogumił Kamiński Avatar answered Oct 17 '22 20:10

Bogumił Kamiński


Since you already used filtering, do it for the reverse too. Or if you have the function that you used to get i, you can just negate that to get the complement set of indices.

julia> i = [1,3,5];

julia> filter(x -> x ∉ i, 1:10)
7-element Array{Int64,1}:
  2
  4
  6
  7
  8
  9
 10

Benchmarks

@benchmark filter(x -> x ∉ i, 1:10)
BenchmarkTools.Trial:
  memory estimate:  304 bytes
  allocs estimate:  2
  --------------
  minimum time:     501.036 ns (0.00% GC)
  median time:      511.399 ns (0.00% GC)
  mean time:        546.609 ns (1.26% GC)
  maximum time:     7.516 μs (92.31% GC)
  --------------
  samples:          10000
  evals/sample:     193

Reference solution

@benchmark ic = [x for x ∈ 1:10 if x ∉ i]
BenchmarkTools.Trial:
  memory estimate:  288 bytes
  allocs estimate:  4
  --------------
  minimum time:     626.036 ns (0.00% GC)
  median time:      646.154 ns (0.00% GC)
  mean time:        692.179 ns (2.72% GC)
  maximum time:     26.065 μs (95.43% GC)
  --------------
  samples:          10000
  evals/sample:     169
like image 7
niczky12 Avatar answered Oct 17 '22 20:10

niczky12


One more worth-knowing-syntax to add to the list (although not too fast) from the InvertedIndices packages (commonly it also gets loaded together with DataFrames):

(1:10)[Not(i)]
julia> @benchmark (1:10)[Not($i)]
BenchmarkTools.Trial:
  memory estimate:  1.42 KiB
  allocs estimate:  39
  --------------
  minimum time:     1.130 μs (0.00% GC)
  median time:      1.320 μs (0.00% GC)
  mean time:        1.713 μs (5.05% GC)
  maximum time:     324.000 μs (98.81% GC)
  --------------
  samples:          10000
  evals/sample:     10
like image 6
Przemyslaw Szufel Avatar answered Oct 17 '22 20:10

Przemyslaw Szufel