Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently removing elements from unit range (Julia)

I'm trying to efficiently remove a vector of elements x from a unit range 1:m and then return a vector of the remaining elements.

For length(x) much smaller than m.

Here are the different methods I came up with,

using Distributions

function func1(m, x)
    for i in 1:1000
        collect(setdiff(1:m, x))
    end
end

function func2(m, x)
    for i in 1:1000
        filter(n -> !(n in x), 1:m)
    end
end

function func3(m, x)
    dict = Dict(zip(1:m, 1:m))
    for i in 1:1000
        d = copy(dict)
        for n in x
            delete!(d, n)
        end
        collect(keys(d))
    end
end

m = 10000
x = sample(1:m, 100)

@time func1(m, x)
@time func2(m, x)
@time func3(m, x)

Function 3 is about twice as fast as functions 1 and 2, however the result isn't sorted, which isn't a deal breaker for me, but I would prefer if the result was sorted.

Because I'm removing elements from a unit range, my intuition tells me that element look up (and deletion) can be made O(1), and thus there should be an algorithm which scales O(len(x)), rather than what I seem to be getting which is O(m) complexity.

like image 544
Thoth Avatar asked May 24 '26 02:05

Thoth


1 Answers

If m is much larger than length of x (i.e. you leave most of the elements) then you can consider this:

function func4(m, x)
    res = Vector{Vector{Int}}(undef, 1000)
    for i in 1:1000
        ind = trues(m)
        ind[x] .= false
        res[i] = findall(ind)
    end
    return res
end

as it should be faster.

(you could be faster if you e.g. knew that x is sorted and unique - and maybe in your original problem you know this or again x is small enough that sorting it and making unique is almost no cost relative to creation of the result)

I have added res on purpose - and I recommend you to add it also in your methods. The reason is that you run at a risk that the compiler notices that your function has no side effects and optimizes out the whole loop as no-op. Here is an example of this happening:

julia> function f()
           for i in 1:1_000_000_000
               s = i
           end
       end
f (generic function with 1 method)

julia> @code_native f()
    .text
; ┌ @ REPL[163]:2 within `f'
    retq
    nopw    %cs:(%rax,%rax)
    nopl    (%rax,%rax)
; └
like image 109
Bogumił Kamiński Avatar answered May 25 '26 17:05

Bogumił Kamiński



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!