Check if two sets have a non-empty intersection

Tags:

julia

I wrote a function intersects that efficiently checks wether two sets have a non-empty intersection (at least, more efficient than checking the size of their intersection).

It works well, but now I would like to specialize this function for the type DataStructures.IntSet from the DataStructures library. I wrote the function below, which works, but is a bit messy.

As you can see, when the attribute inverse is true, I have to negate the current chunk. In an attempt to simplify the code, I wrote the function intersects2 which does the same work but without the need of multiples if/else.

But this code appears to be less efficient than the first one. I'm not sure, but I think the problem is that each call to op_u or op_v copy the parameter, as shown in the output below.

How can I rewrite this function such that it does not make any copy (ie. no allocation) and without several imbricated if/else? The full code, benchmarks and results can be found below.

using Random
using DataStructures
using BenchmarkTools

elems = [randstring(8) for i in 1:10000]
ii = rand(1:10000, 3000)
jj = rand(1:10000, 3000)


x1 = Set(elems[ii])
y1 = Set(elems[jj])

x2 = Set(ii)
y2 = Set(jj)

x3 = DataStructures.IntSet(ii)
y3 = DataStructures.IntSet(jj)

function intersects(u, v)
    for x in u
        if x in v
            return true
        end
    end
    false
end

function intersects(u::DataStructures.IntSet, v::DataStructures.IntSet)
    ch_u, ch_v = u.bits.chunks, v.bits.chunks
    for i in 1:length(ch_u)
        if u.inverse
            if v.inverse
                if ~ch_u[i] & ~ch_v[i] > 0
                    return true
                end
            else
                if ~ch_u[i] & ch_v[i] > 0
                    return true
                end
            end
        else
            if v.inverse
                if ch_u[i] & ~ch_v[i] > 0
                    return true
                end
            else
                if ch_u[i] & ch_v[i] > 0
                    return true
                end
            end
        end
    end
    false
end

function intersects2(u::DataStructures.IntSet, v::DataStructures.IntSet)
    op_u = if u.inverse x->~x else x->x end
    op_v = if v.inverse x->~x else x->x end

    ch_u, ch_v = u.bits.chunks, v.bits.chunks
    for i in 1:length(ch_u)
        if op_u(ch_u[i]) & op_v(ch_v[i]) > 0
            return true
        end
    end
    false
end


println("Set{String}")
@btime intersects($x1, $y1)

println("Set{Int}")
@btime intersects($x2, $y2)

println("IntSet")
@btime intersects($x3, $y3)
@btime intersects2($x3, $y3)

Set{String}
  190.163 ns (0 allocations: 0 bytes)
Set{Int}
  17.935 ns (0 allocations: 0 bytes)
IntSet
  7.099 ns (0 allocations: 0 bytes)
  90.000 ns (5 allocations: 80 bytes)

806

asked Feb 15 '20 23:02

GaaH

2 Answers

The overhead you are seeing is likely due to function call overhead: op_u is not being inlined.

This version inlines correctly and has the same performance as intersects:

julia> function intersects2(u::DataStructures.IntSet, v::DataStructures.IntSet)
           op_u(x) = u.inverse ? ~x : x
           op_v(x) = v.inverse ? ~x : x

           ch_u, ch_v = u.bits.chunks, v.bits.chunks
           for i in 1:length(ch_u)
               if op_u(ch_u[i]) & op_v(ch_v[i]) > 0
                   return true
               end
           end
           false
       end

179

answered Nov 04 '22 05:11

David Varela

You could have avoided the nesting also with something like this, this is also a bit faster than your intersects(u::DataStructures.IntSet, v::DataStructures.IntSet) method, because of @inbounds:

function intersects4(u::DataStructures.IntSet, v::DataStructures.IntSet)
    ch_u, ch_v = u.bits.chunks, v.bits.chunks
    for i in eachindex(length(ch_u) > length(ch_v) ? ch_u : ch_v)
        @inbounds if (u.inverse && v.inverse && ~ch_u[i] & ~ch_v[i] > 0) ||
           (u.inverse && ~ch_u[i] & ch_v[i] > 0) ||
           (v.inverse && ch_u[i] & ~ch_v[i] > 0) ||
           (ch_u[i] & ch_v[i] > 0) 
            return true
        end
    end
    return false
end

Are you sure there can not be a BoundsError if ch_u's lenght is larger than ch_v's and the function happens to run the whole loop? I did for i in eachindex(length(ch_u) > length(ch_v) ? ch_u : ch_v) just in case, because I haven't tested your function thoroughly.

Great answer by @David Varela (I'll call his function intersects3), you can get even a bit faster than intersect4 above if you make sure there will be no BoundsError and use @inbounds, like this:

function intersects5(u::DataStructures.IntSet, v::DataStructures.IntSet)
    op_u(x) = u.inverse ? ~x : x
    op_v(x) = v.inverse ? ~x : x

    ch_u, ch_v = u.bits.chunks, v.bits.chunks
    for i in eachindex(length(ch_u) > length(ch_v) ? ch_u : ch_v)
        @inbounds if op_u(ch_u[i]) & op_v(ch_v[i]) > 0
            return true
        end
    end
    return false
end

julia> begin
           @btime intersects($x3, $y3)   # nested ifs and bound checks
           @btime intersects2($x3, $y3)  # with non inline-able ops
           @btime intersects3($x3, $y3)  # David's improvement
           @btime intersects4($x3, $y3)  # without ifs and with @inbounds
           @btime intersects5($x3, $y3)  # David's improvement with @inbounds
       end
7.184 ns (0 allocations: 0 bytes)
87.633 ns (5 allocations: 80 bytes)
6.500 ns (0 allocations: 0 bytes)
3.763 ns (0 allocations: 0 bytes)
3.079 ns (0 allocations: 0 bytes)

Also in your code:

op_u = if u.inverse x -> ~x else x -> x end

You could have also done:

op_u = u.inverse ? (~) : identity

As ~ operator is just a function and an identity function already exists, anyway David's answer is better (because this won't inline either) and intersects5 is the fastest and more succinct of all, just wanted to mention it though.

answered Nov 04 '22 05:11

HarmonicaMuse

Related questions
                            
                                Function signature not found despite showing with methods(...)
                            
                                Memory usage blows up when iterating over array
                            
                                Julia: how do I convert a symbolic expression to a function?
                            
                                Julia: using splat to pass along arguments
                            
                                How to make a custom colormap using PyPlot (not matplotlib proper)
                            
                                Powering a Math constant in Julia is slow
                            
                                How to get the milliseconds from now() in Julia 0.4-pre?
                            
                                Why doesn't rand work with AbstractFloat?
                            
                                Abstract types in R
                            
                                Inner constructor of parametric type
                            
                                Rewriting Mixed effects model formula from R (lme4) to Julia
                            
                                Prevent overwriting modules in Julia parallelization
                            
                                Julia: How to save a figure without plotting/displaying it in PyPlot?
                            
                                Julia - C interface with nonfundamental types
                            
                                Color of errorbars with plots.jl
                            
                                Which is the alternative to "workspace()" in julia 1.0?
                            
                                how to kill a task in Julia?
                            
                                Julia 1.0.0: What does the `=>` operator do?
                            
                                How to use Julia to compute the pearson correlation coefficient with p-value?
                            
                                How to tell LLVM that it can optimize away stores?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With