Suppose that I have a variable i = [1,3,5]
, which was obtained when I applied a filter to an array. Now, suppose that this array has 10 elements, and I'd like to obtain the "complement" indexes. I mean, I want to obtain ic = [2,4,6,7,8,9,10]
.
Is there a clean and short way of obtaining this complement list of indexes?
I mean, I can do this with a regular loop, but is there a way of doing this with list comprehension?
Accessing element at a specific index in Julia – getindex() Method. The getindex() is an inbuilt function in julia which is used to construct array of the specified type. This function is also used to get the element of the array at a specific index.
Conventionally, Julia's arrays are indexed starting at 1, whereas some other languages start numbering at 0, and yet others (e.g., Fortran) allow you to specify arbitrary starting indices.
You can use setdiff
function
julia> x = [1, 3, 5];
julia> y = collect(1:10);
julia> setdiff(y, x)
7-element Vector{Int64}:
2
4
6
7
8
9
10
Performance-wise, loop-based implementation is better, because we can take into account that new indices are a subset of the original
function mysetdiff(y, x)
res = Vector{eltype(y)}(undef, length(y) - length(x))
i = 1
@inbounds for el in y
el ∈ x && continue
res[i] = el
i += 1
end
res
end
and comparison
using BenchmarkTools
@btime [z for z ∈ $y if z ∉ $x]
# 141.893 ns (4 allocations: 288 bytes)
@btime setdiff($y, $x)
# 477.056 ns (8 allocations: 688 bytes)
@btime mysetdiff($y, $x)
# 46.434 ns (1 allocation: 144 bytes)
You need to get all elements in 1:10
that aren't in i
. So using list comprehension:
julia> i = [1,3,5];
julia> ic = [x for x ∈ 1:10 if x ∉ i]
7-element Array{Int64,1}:
2
4
6
7
8
9
10
You can type ∈ in REPL with \in
Tab. ∉ can be achieved with \notin
Tab. If you're coding in an environment that don't allow this, you can just copy from here or type one of these instead:
julia> ic = [x for x in 1:10 if !(x in i)]
julia> ic = [x for x in 1:10 if !in(x, i)] # operators are functions
If you care about performance, here the benchmark:
julia> @benchmark ic = [x for x ∈ 1:10 if x ∉ i]
BenchmarkTools.Trial:
memory estimate: 288 bytes
allocs estimate: 4
--------------
minimum time: 462.274 ns (0.00% GC)
median time: 471.168 ns (0.00% GC)
mean time: 497.947 ns (2.86% GC)
maximum time: 13.115 μs (95.25% GC)
--------------
samples: 10000
evals/sample: 197
If you care about performance you can also consider:
julia> @benchmark deleteat!([1:10;], $i) # indices must be unique and sorted
BenchmarkTools.Trial:
memory estimate: 160 bytes
allocs estimate: 1
--------------
minimum time: 53.798 ns (0.00% GC)
median time: 60.790 ns (0.00% GC)
mean time: 71.125 ns (1.76% GC)
maximum time: 618.946 ns (77.28% GC)
--------------
samples: 10000
evals/sample: 987
and
julia> @benchmark (x = trues(10); x[$i] .= false; findall(x)) # if Bool-array is enough for you you can skip the last step to save 50% of time
BenchmarkTools.Trial:
memory estimate: 272 bytes
allocs estimate: 3
--------------
minimum time: 124.863 ns (0.00% GC)
median time: 134.033 ns (0.00% GC)
mean time: 157.668 ns (6.84% GC)
maximum time: 3.000 μs (95.44% GC)
--------------
samples: 10000
evals/sample: 905
vs earlier proposed:
julia> @benchmark ic = [x for x ∈ 1:10 if x ∉ $i]
BenchmarkTools.Trial:
memory estimate: 288 bytes
allocs estimate: 4
--------------
minimum time: 170.714 ns (0.00% GC)
median time: 196.571 ns (0.00% GC)
mean time: 222.510 ns (4.43% GC)
maximum time: 3.078 μs (91.01% GC)
--------------
samples: 10000
evals/sample: 700
and
julia> @benchmark setdiff($[1:10;], $i)
BenchmarkTools.Trial:
memory estimate: 672 bytes
allocs estimate: 7
--------------
minimum time: 504.145 ns (0.00% GC)
median time: 514.508 ns (0.00% GC)
mean time: 589.584 ns (2.75% GC)
maximum time: 8.954 μs (90.69% GC)
--------------
samples: 10000
evals/sample: 193
and (custom implementation from the other post)
julia> @benchmark mysetdiff($[1:10;], $i)
BenchmarkTools.Trial:
memory estimate: 144 bytes
allocs estimate: 1
--------------
minimum time: 44.748 ns (0.00% GC)
median time: 46.869 ns (0.00% GC)
mean time: 52.780 ns (1.75% GC)
maximum time: 431.919 ns (88.49% GC)
--------------
samples: 10000
evals/sample: 990
Since you already used filtering, do it for the reverse too. Or if you have the function that you used to get i
, you can just negate that to get the complement set of indices.
julia> i = [1,3,5];
julia> filter(x -> x ∉ i, 1:10)
7-element Array{Int64,1}:
2
4
6
7
8
9
10
@benchmark filter(x -> x ∉ i, 1:10)
BenchmarkTools.Trial:
memory estimate: 304 bytes
allocs estimate: 2
--------------
minimum time: 501.036 ns (0.00% GC)
median time: 511.399 ns (0.00% GC)
mean time: 546.609 ns (1.26% GC)
maximum time: 7.516 μs (92.31% GC)
--------------
samples: 10000
evals/sample: 193
@benchmark ic = [x for x ∈ 1:10 if x ∉ i]
BenchmarkTools.Trial:
memory estimate: 288 bytes
allocs estimate: 4
--------------
minimum time: 626.036 ns (0.00% GC)
median time: 646.154 ns (0.00% GC)
mean time: 692.179 ns (2.72% GC)
maximum time: 26.065 μs (95.43% GC)
--------------
samples: 10000
evals/sample: 169
One more worth-knowing-syntax to add to the list (although not too fast) from the InvertedIndices
packages (commonly it also gets loaded together with DataFrames
):
(1:10)[Not(i)]
julia> @benchmark (1:10)[Not($i)]
BenchmarkTools.Trial:
memory estimate: 1.42 KiB
allocs estimate: 39
--------------
minimum time: 1.130 μs (0.00% GC)
median time: 1.320 μs (0.00% GC)
mean time: 1.713 μs (5.05% GC)
maximum time: 324.000 μs (98.81% GC)
--------------
samples: 10000
evals/sample: 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With